summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2020-04-22Fix memory leak in libpq when using sslmode=verify-fullMichael Paquier
Checking if Subject Alternative Names (SANs) from a certificate match with the hostname connected to leaked memory after each lookup done. This is broken since acd08d7 that added support for SANs in SSL certificates, so backpatch down to 9.5. Author: Roman Peshkurov Reviewed-by: Hamid Akhtar, Michael Paquier, David Steele Discussion: https://postgr.es/m/CALLDf-pZ-E3mjxd5=bnHsDu9zHEOnpgPgdnO84E2RuwMCjjyPw@mail.gmail.com Backpatch-through: 9.5
2020-04-21Fix possible crash during FATAL exit from reindexing.Tom Lane
index.c supposed that it could just use a PG_TRY block to clean up the state associated with an active REINDEX operation. However, that code doesn't run if we do a FATAL exit --- for example, due to a SIGTERM shutdown signal --- while the REINDEX is happening. And that state does get consulted during catalog accesses, which makes it problematic if we do any catalog accesses during shutdown --- for example, to clean up any temp tables created in the session. If this combination of circumstances occurred, we could find ourselves trying to access already-freed memory. In debug builds that'd fairly reliably cause an assertion failure. In production we might often get away with it, but with some bad luck it could cause a core dump. Another possible bad outcome is an erroneous conclusion that an index-to-be-accessed is being reindexed; but it looks like that would be unlikely to have any consequences worse than failing to drop temp tables right away. (They'd still get dropped by the next session that uses that temp schema.) To fix, get rid of the use of PG_TRY here, and instead hook into the transaction abort mechanisms to clean up reindex state. Per bug #16378 from Alexander Lakhin. This has been wrong for a very long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
2020-04-21Fix minor violations of FunctionCallInvoke usage protocol.Tom Lane
Working on commit 1c455078b led me to check through FunctionCallInvoke call sites to see if every one was being honest about (a) making sure that fcinfo.isnull is initially false, and (b) checking its state after the call. Sure enough, I found some violations. The main one is that finalize_partialaggregate re-used serialfn_fcinfo without resetting isnull, even though it clearly intends to cater for serialfns that return NULL. There would only be an issue with a non-strict serialfn, since it's unlikely that a serialfn would return NULL for non-null input. We have no non-strict serialfns in core, and there may be none in the wild either, which would account for the lack of complaints. Still, it's clearly wrong, so back-patch that fix to 9.6 where finalize_partialaggregate was introduced. Also, arrayfuncs.c and rowtypes.c contained various callers that were not bothering to check for result nulls. While what's being called is a comparison or hash function that probably *shouldn't* return null, that's a lousy excuse for not having any check at all. There are existing places that just Assert(!fcinfo->isnull) in comparable situations, so I added that to the places that were calling btree comparison or hash support functions. In the places calling boolean-returning equality functions, it's quite cheap to have them treat isnull as FALSE, so make those places do that. Also remove some "locfcinfo->isnull = false" assignments that are unnecessary given the assumption that no previous call returned null. These changes seem like mostly neatnik-ism or debugging support, so I didn't back-patch.
2020-04-18Fix race conditions in synchronous standby management.Tom Lane
We have repeatedly seen the buildfarm reach the Assert(false) in SyncRepGetSyncStandbysPriority. This apparently is due to failing to consider the possibility that the sync_standby_priority values in shared memory might be inconsistent; but they will be whenever only some of the walsenders have updated their values after a change in the synchronous_standby_names setting. That function is vastly too complex for what it does, anyway, so rewriting it seems better than trying to apply a band-aid fix. Furthermore, the API of SyncRepGetSyncStandbys is broken by design: it returns a list of WalSnd array indexes, but there is nothing guaranteeing that the contents of the WalSnd array remain stable. Thus, if some walsender exits and then a new walsender process takes over that WalSnd array slot, a caller might make use of WAL position data that it should not, potentially leading to incorrect decisions about whether to release transactions that are waiting for synchronous commit. To fix, replace SyncRepGetSyncStandbys with a new function SyncRepGetCandidateStandbys that copies all the required data from shared memory while holding the relevant mutexes. If the associated walsender process then exits, this data is still safe to make release decisions with, since we know that that much WAL *was* sent to a valid standby server. This incidentally means that we no longer need to treat sync_standby_priority as protected by the SyncRepLock rather than the per-walsender mutex. SyncRepGetSyncStandbys is no longer used by the core code, so remove it entirely in HEAD. However, it seems possible that external code is relying on that function, so do not remove it from the back branches. Instead, just remove the known-incorrect Assert. When the bug occurs, the function will return a too-short list, which callers should treat as meaning there are not enough sync standbys, which seems like a reasonably safe fallback until the inconsistent state is resolved. Moreover it's bug-compatible with what has been happening in non-assert builds. We cannot do anything about the walsender-replacement race condition without an API/ABI break. The bogus assertion exists back to 9.6, but 9.6 is sufficiently different from the later branches that the patch doesn't apply at all. I chose to just remove the bogus assertion in 9.6, feeling that the probability of a bad outcome from the walsender-replacement race condition is too low to justify rewriting the whole patch for 9.6. Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
2020-04-17Use a slightly more liberal regex to detect Visual Studio versionAndrew Dunstan
Apparently in some language versions of Visual Studio nmake outputs some material after the version number and before the end of the line. This has been seen in Chinese versions. Therefore, we no longer demand that the version string comes at the end of a line. Per complaint from Cuiping Lin. Backpatch to all live branches.
2020-04-15Fix minor memory leak in pg_dumpMichael Paquier
A query used to read default ACL information from the catalogs did not free a set of PQExpBuffer. Oversight in commit e2090d9, so backpatch down to 9.6. Author: Jie Zhang Reviewed-by: Sawada Masahiko Discussion: https://postgr.es/m/05bcbc5857f948efa0b451b85a48ae10@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 9.6
2020-04-11Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case.Tom Lane
ExecReScanHashJoin will destroy the join's hash table if it expects that the inner relation will produce different rows on rescan. Up to now it's not bothered to clear the additional pointer to that hash table that exists in the child HashState node. However, it's possible for the query to terminate without building a fresh hash table (this happens if the outer relation is found to be empty during the final rescan). So we can end with a dangling pointer to a deleted hash table. That was harmless originally, but since 9.0 EXPLAIN ANALYZE has used that pointer to print hash table statistics. In debug builds this reproducibly results in garbage statistics. In non-debug builds there's frequently no ill effects, but in principle one could get wrong EXPLAIN ANALYZE output, or perhaps even a crash if free() has released the hashtable memory back to the OS. To fix, just make sure we clear the additional pointer when destroying the hash table. In problematic cases, EXPLAIN ANALYZE will then print no hashtable statistics (reverting to its pre-9.0 behavior). This isn't ideal, but since the problem manifests only in unusual corner cases, it's hard to justify taking any risks to do better in the back branches. A follow-on patch will improve matters in HEAD. Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro of a trouble report from Alvaro Herrera. Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-09Further cleanup of ts_headline code.Tom Lane
Suppress a probably-meaningless uninitialized-variable warning (induced by my previous patch, I'm sorry to say). Improve mark_hl_fragments()'s test for overlapping cover strings: it failed to consider the possibility that the current string is strictly within another one. That's unlikely given the preceding splitting into MaxWords fragments, but I don't think it's impossible. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-09Fix default text search parser's ts_headline code for phrase queries.Tom Lane
This code could produce very poor results when asked to highlight a string based on a query using phrase-match operators. The root cause is that hlCover(), which is supposed to find a minimal substring that matches the query, was written assuming that word position is not significant. I'm only 95% convinced that its algorithm was correct even for plain AND/OR queries; but it definitely fails completely for phrase matches, causing it to possibly not identify a cover string at all. Hence, rewrite hlCover() with a less-tense algorithm that just tries all the possible substrings, earlier and shorter ones first. (This is not as bad as it sounds performance-wise, because all of the string matching has been done already: the repeated tsquery match checks boil down to pointer comparisons.) Unfortunately, since that approach produces more candidate cover strings than before, it also exposes that there were bugs in the heuristics in mark_hl_words() for selecting a best cover string. Fixes there include: * Do not apply the ShortWord filter to words that appear in the query. * Remove a misguided optimization for quickly rejecting a cover. * Fix order-of-operation bug that could cause computation of a wrong figure of merit (poslen) when shortening a cover. * Change the preference rule so that candidate headlines that do not include their whole cover string (after MaxWords trimming) are lowest priority, since they may not actually satisfy the user's query. This results in some changes in existing regression test cases, but they all seem reasonable. Note in particular that the tests involving strings like "1 2 3" were previously being affected by the ShortWord filter, masking the normal matching behavior. Per bug #16345 from Augustinas Jokubauskas; the new test cases are based on that example. Back-patch to 9.6 where phrase search was added to tsquery. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-09Cosmetic improvements for default text search parser's ts_headline code.Tom Lane
This code was woefully unreadable and under-commented. Try to improve matters by adding comments, using some macros to make complicated if-tests more readable, using boolean type where appropriate, etc. There are a couple of tiny coding improvements too, but this commit includes (I hope) no behavioral change. Nonetheless, back-patch as far as 9.6, because a followup bug-fixing commit depends on this. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-08Fix pg_dump/pg_restore to restore event trigger comments later.Tom Lane
Repair an oversight in commit 8728b2c70: if we're postponing restore of event triggers to the end, we must also postpone restoring any comments on them, since of course we cannot create the comments first. (This opens yet another opportunity for an event trigger to bollix the restore, but there's no help for that.) Per bug #16346 from Alexander Lakhin. Like the previous commit, back-patch to all supported branches. Hamid Akhtar and Tom Lane Discussion: https://postgr.es/m/16346-6210ad7a0ea81be1@postgresql.org
2020-04-07Fix circle_in to accept "(x,y),r" as it's advertised to do.Tom Lane
Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
2020-04-07Adjust bytea get_bit/set_bit to cope with bytea strings > 256MB.Tom Lane
Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. However, it'd be good if they could do at least that much, and not fall over entirely for longer bytea values. Adjust the comparisons to be done in int64 arithmetic so that works. Also tweak the error reports to show sane values in case of overflow. Also add some test cases to improve the miserable code coverage of these functions. Apply patch to back branches only; HEAD has a better solution as of commit 26a944cf2. Extracted from a much larger patch by Movead Li Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-04-06Preserve clustered index after rewrites with ALTER TABLEMichael Paquier
A table rewritten by ALTER TABLE would lose tracking of an index usable for CLUSTER. This setting is tracked by pg_index.indisclustered and is controlled by ALTER TABLE, so some extra work was needed to restore it properly. Note that ALTER TABLE only marks the index that can be used for clustering, and does not do the actual operation. Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com Backpatch-through: 9.5
2020-04-05Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().Andres Freund
There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-
2020-04-05Save errno across LWLockRelease() callsPeter Eisentraut
Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>
2020-04-03Fix bugs in gin_fuzzy_search_limit processing.Tom Lane
entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
2020-03-30Be more careful about extracting encoding from locale strings on Windows.Tom Lane
GetLocaleInfoEx() can fail on strings that setlocale() was perfectly happy with. A common way for that to happen is if the locale string is actually a Unix-style string, say "et_EE.UTF-8". In that case, what's after the dot is an encoding name, not a Windows codepage number; blindly treating it as a codepage number led to failure, with a fairly silly error message. Hence, check to see if what's after the dot is all digits, and if not, treat it as a literal encoding name rather than a codepage number. This will do the right thing with many Unix-style locale strings, and produce a more sensible error message otherwise. Somewhat independently of that, treat a zero (CP_ACP) result from GetLocaleInfoEx() as meaning that we must use UTF-8 encoding. Back-patch to all supported branches. Juan José Santamaría Flecha Discussion: https://postgr.es/m/24905.1585445371@sss.pgh.pa.us
2020-03-28Ensure snapshot is registered within ScanPgRelation().Andres Freund
In 9.4 I added support to use a historical snapshot in ScanPgRelation(), while adding logical decoding. Unfortunately a conflict with the concurrent removal of SnapshotNow was incorrectly resolved, leading to an unregistered snapshot being used. It is not correct to use an unregistered (or non-active) snapshot for anything non-trivial, because catalog invalidations can cause the snapshot to be invalidated. Luckily it seems unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and we don't appear to rely on the result's tid to be correct. It however is clearly wrong and potential negative consequences would likely be hard to find. So it seems worth backpatching the fix, even without a concrete hazard. Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de Backpatch: 9.5-
2020-03-26Ensure that plpgsql cleans up cleanly during parallel-worker exit.Tom Lane
plpgsql_xact_cb ought to treat events XACT_EVENT_PARALLEL_COMMIT and XACT_EVENT_PARALLEL_ABORT like XACT_EVENT_COMMIT and XACT_EVENT_ABORT respectively, since its goal is to do process-local cleanup. This oversight caused plpgsql's end-of-transaction cleanup to not get done in parallel workers. Since a parallel worker will exit just after the transaction cleanup, the effects of this are limited. I couldn't find any case in the core code with user-visible effects, but perhaps there are some in extensions. In any case it's wrong, so let's fix it before it bites us not after. In passing, add some comments around the handling of expression evaluation resources in DO blocks. There's no live bug there, but it's quite unobvious what's happening; at least I thought so. This isn't related to the other issue, except that I found both things while poking at expression-evaluation performance. Back-patch the plpgsql_xact_cb fix to 9.5 where those event types were introduced, and the DO-block commentary to v11 where DO blocks gained the ability to issue COMMIT/ROLLBACK. Discussion: https://postgr.es/m/10353.1585247879@sss.pgh.pa.us
2020-03-26Drop slot's LWLock before returning from SaveSlotToPath()Peter Eisentraut
When SaveSlotToPath() is called with elevel=LOG, the early exits didn't release the slot's io_in_progress_lock. This could result in a walsender being stuck on the lock forever. A possible way to get into this situation is if the offending code paths are triggered in a low disk space situation. Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com> Reported-by: Craig Ringer <craig@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
2020-03-23Fix our getopt_long's behavior for a command line argument of just "-".Tom Lane
src/port/getopt_long.c failed on such an argument, always seeing it as an unrecognized switch. This is unhelpful; better is to treat such an item as a non-switch argument. That behavior is what we find in GNU's getopt_long(); it's what src/port/getopt.c does; and it is required by POSIX for getopt(), which getopt_long() ought to be generally a superset of. Moreover, it's expected by ecpg, which intends an argument of "-" to mean "read from stdin". So fix it. Also add some documentation about ecpg's behavior in this area, since that was miserably underdocumented. I had to reverse-engineer it from the code. Per bug #16304 from James Gray. Back-patch to all supported branches, since this has been broken forever. Discussion: https://postgr.es/m/16304-c662b00a1322db7f@postgresql.org
2020-03-22Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch
This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
2020-03-21Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch
Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-03-21Back-patch log_newpage_range().Noah Misch
Back-patch a subset of commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b to v11, v10, 9.6, and 9.5. Include the latest repairs to this function. Use a new XLOG_FPI_MULTI value instead of reusing XLOG_FPI. That way, if an older server reads WAL from this function, that server will PANIC instead of applying just one page of the record. The next commit adds a call to this function. Discussion: https://postgr.es/m/20200304.162919.898938381201316571.horikyota.ntt@gmail.com
2020-03-21During heap rebuild, lock any TOAST index until end of transaction.Noah Misch
swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
2020-03-21Fix cosmetic blemishes involving rd_createSubid.Noah Misch
Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting of a comment in struct RelationData, mangled by the pgindent run in commit 9af4159fce6654aa0e081b00d02bca40b978745c. Back-patch to 9.5 (all supported versions), because another fix stacks on this.
2020-03-20Turn off deprecated bison warnings under MSVCAndrew Dunstan
These are disabled by the configure code, so this is just fixing an inconsistency in the MSVC code. Backpatch to all live branches.
2020-03-19pg_upgrade: make get_major_server_version() err msg consistentBruce Momjian
This patch fixes the error message in get_major_server_version() to be "could not parse version file", and uses the full file path name, rather than just the data directory path. Also, commit 4109bb5de4 added the cause of the failure to the "could not open" error message, and improved quoting. This patch backpatches the "could not open" cause to PG 12, where it was first widely used, and backpatches the quoting fix in that patch to all supported releases. Reported-by: Tom Lane Discussion: https://postgr.es/m/87pne2w98h.fsf@wibble.ilmari.org Author: Dagfinn Ilmari Mannsåker Backpatch-through: 9.5
2020-03-18Add missing errcode() in a few ereport calls.Amit Kapila
This will allow to specifying SQLSTATE error code for the errors in the missing places. Reported-by: Sawada Masahiko Author: Sawada Masahiko Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-16Avoid holding a directory FD open across assorted SRF calls.Tom Lane
This extends the fixes made in commit 085b6b667 to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-16Plug memory leakAlvaro Herrera
Introduced by b08dee24a557. Noted by Coverity.
2020-03-14C comment: correct commented bytes of max_cached_tuplebufsBruce Momjian
The comment said ~8MB, but it is actually ~64MB. Reported-by: Kuntal Ghosh Discussion: https://postgr.es/m/CAGz5QC+GGmHdnxp04B6wcLz2Zcd_HU+wCBrsPyOZP62-BJghig@mail.gmail.com Backpatch-through: 9.5-10
2020-03-14Restructure polymorphic-type resolution in funcapi.c.Tom Lane
resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to cover the case of having to resolve anyarray given only an anyrange input. The bug was masked if anyelement was also used (as either input or output), which probably helps account for our not having noticed. While looking at this I noticed that resolve_generic_type() would produce the wrong answer if asked to make that same resolution. ISTM that resolve_generic_type() is confusingly defined and overly complex, so rather than fix it, let's just make funcapi.c do the specific lookups it requires for itself. With this change, resolve_generic_type() is not used anywhere, so remove it in HEAD. In the back branches, leave it alone (complete with bug) just in case any external code is using it. While we're here, make some other refactoring adjustments in funcapi.c with an eye to upcoming future expansion of the set of polymorphic types: * Simplify quick-exit tests by adding an overall have_polymorphic_result flag. This is about a wash now but will be a win when there are more flags. * Reduce duplication of code between resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes(). * Don't bother to validate correct matching of anynonarray or anyenum; the parser should have done that, and even if it didn't, just doing "return false" here would lead to a very confusing, off-point error message. (Really, "return false" in these two functions should only occur if the call_expr isn't supplied or we can't obtain data type info from it.) * For the same reason, throw an elog rather than "return false" if we fail to resolve a polymorphic type. The bug's been there since we added anyrange, so back-patch to all supported branches. Discussion: https://postgr.es/m/6093.1584202130@sss.pgh.pa.us
2020-03-13Preserve replica identity index across ALTER TABLE rewritePeter Eisentraut
If an index was explicitly set as replica identity index, this setting was lost when a table was rewritten by ALTER TABLE. Because this setting is part of pg_index but actually controlled by ALTER TABLE (not part of CREATE INDEX, say), we have to do some extra work to restore it. Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com> Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
2020-03-12Fix nextXid tracking bug on standbys (9.5-11 only).Thomas Munro
RecordKnownAssignedTransactionIds() should never move nextXid backwards. Before this commit, that could happen if some other code path had advanced it without advancing latestObservedXid. One consequence is that a well timed XLOG_CHECKPOINT_ONLINE could cause hot standby feedback messages to get confused and report an xmin from a future epoch, potentially allowing vacuum to run too soon on the primary. Repair, by making sure RecordKnownAssignedTransactionIds() can only move nextXid forwards. In release 12 and master, this was already done by commit 2fc7af5e, which consolidated similar code and straightened out this bug. Back-patch to supported releases before that. Author: Eka Palamadai <ekanatha@amazon.com> Discussion: https://postgr.es/m/98BB4805-D0A2-48E1-96F4-15014313EADC@amazon.com
2020-03-11Add pg_dump support for ALTER obj DEPENDS ON EXTENSIONAlvaro Herrera
pg_dump is oblivious to this kind of dependency, so they're lost on dump/restores (and pg_upgrade). Have pg_dump emit ALTER lines so that they're preserved. Add some pg_dump tests for the whole thing, also. Reviewed-by: Tom Lane (offlist) Reviewed-by: Ibrar Ahmed Reviewed-by: Ahsan Hadi (who also reviewed commit 899a04f5ed61) Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
2020-03-11Avoid duplicates in ALTER ... DEPENDS ON EXTENSIONAlvaro Herrera
If the command is attempted for an extension that the object already depends on, silently do nothing. In particular, this means that if a database containing multiple such entries is dumped, the restore will silently do the right thing and record just the first one. (At least, in a world where pg_dump does dump such entries -- which it doesn't currently, but it will.) Backpatch to 9.6, where this kind of dependency was introduced. Reviewed-by: Ibrar Ahmed, Tom Lane (offlist) Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
2020-03-09Fix pg_dump/pg_restore to restore event triggers later.Tom Lane
Previously, event triggers were restored just after regular triggers (and FK constraints, which are basically triggers). This is risky since an event trigger, once installed, could interfere with subsequent restore commands. Worse, because event triggers don't have any particular dependencies on any post-data objects, a parallel restore would consider them eligible to be restored the moment the post-data phase starts, allowing them to also interfere with restoration of a whole bunch of objects that would have been restored before them in a serial restore. There's no way to completely remove the risk of a misguided event trigger breaking the restore, since if nothing else it could break other event triggers. But we can certainly push them to later in the process to minimize the hazard. To fix, tweak the RestorePass mechanism introduced by commit 3eb9a5e7c so that event triggers are handled as part of the post-ACL processing pass (renaming the "REFRESH" pass to "POST_ACL" to reflect its more general use). This will cause them to restore after everything except matview refreshes, which seems OK since matview refreshes really ought to run in the post-restore state of the database. In a parallel restore, event triggers and matview refreshes might be intermixed, but that seems all right as well. Also update the code and comments in pg_dump_sort.c so that its idea of how things are sorted agrees with what actually happens due to the RestorePass mechanism. This is mostly cosmetic: it'll affect the order of objects in a dump's TOC, but not the actual restore order. But not changing that would be quite confusing to somebody reading the code. Back-patch to all supported branches. Fabrízio de Royes Mello, tweaked a bit by me Discussion: https://postgr.es/m/CAFcNs+ow1hmFox8P--3GSdtwz-S3Binb6ZmoP6Vk+Xg=K6eZNA@mail.gmail.com
2020-03-10Fix bug that causes to report waiting in PS display twice, in hot standby.Fujii Masao
Previously "waiting" could appear twice via PS in case of lock conflict in hot standby mode. Specifically this issue happend when the delay in WAL application determined by max_standby_archive_delay and max_standby_streaming_delay had passed but it took more than 500 msec to cancel all the conflicting transactions. Especially we can observe this easily by setting those delay parameters to -1. The cause of this issue was that WaitOnLock() and ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to the process title in that case. This commit prevents ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting in case of lock conflict, to fix the bug. Back-patch to all back branches. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
2020-03-09Avoid assertion failure with targeted recovery in standby mode.Fujii Masao
At the end of recovery, standby mode is turned off to re-fetch the last valid record from archive or pg_wal. Previously, if recovery target was reached and standby mode was turned off while the current WAL source was stream, recovery could try to retrieve WAL file containing the last valid record unexpectedly from stream even though not in standby mode. This caused an assertion failure. That is, the assertion test confirms that WAL file should not be retrieved from stream if standby mode is not true. This commit moves back the current WAL source to archive if it's stream even though not in standby mode, to avoid that assertion failure. This issue doesn't cause the server to crash when built with assertion disabled. In this case, the attempt to retrieve WAL file from stream not in standby mode just fails. And then recovery tries to retrieve WAL file from archive or pg_wal. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
2020-02-27createdb: Fix quoting of --encoding, --lc-ctype and --lc-collateMichael Paquier
The original coding failed to properly quote those arguments, leading to failures when using quotes in the values used. As the quoting can be encoding-sensitive, the connection to the backend needs to be taken before applying the correct quoting. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200214041004.GB1998@paquier.xyz Backpatch-through: 9.5
2020-02-19Fix confusion about event trigger vs. plain function in plpgsql.Tom Lane
The function hash table keys made by compute_function_hashkey() failed to distinguish event-trigger call context from regular call context. This meant that once we'd successfully made a hash entry for an event trigger (either by validation, or by normal use as an event trigger), an attempt to call the trigger function as a plain function would find this hash entry and thereby bypass the you-can't-do-that check in do_compile(). Thus we'd attempt to execute the function, leading to strange errors or even crashes, depending on function contents and server version. To fix, add an isEventTrigger field to PLpgSQL_func_hashkey, paralleling the longstanding infrastructure for regular triggers. This fits into what had been pad space, so there's no risk of an ABI break, even assuming that any third-party code is looking at these hash keys. (I considered replacing isTrigger with a PLpgSQL_trigtype enum field, but felt that that carried some API/ABI risk. Maybe we should change it in HEAD though.) Per bug #16266 from Alexander Lakhin. This has been broken since event triggers were invented, so back-patch to all supported branches. Discussion: https://postgr.es/m/16266-fcd7f838e97ba5d4@postgresql.org
2020-02-19Fix mesurement of elapsed time during truncating heap in VACUUM.Fujii Masao
VACUUM may truncate heap in several batches. The activity report is logged for each batch, and contains the number of pages in the table before and after the truncation, and also the elapsed time during the truncation. Previously the elapsed time reported in each batch was the total elapsed time since starting the truncation until finishing each batch. For example, if the truncation was processed dividing into three batches, the second batch reported the accumulated time elapsed during both first and second batches. This is strange and confusing because the number of pages in the table reported together is not total. Instead, each batch should report the time elapsed during only that batch. The cause of this issue was that the resource usage snapshot was initialized only at the beginning of the truncation and was never reset later. This commit fixes the issue by changing VACUUM so that the resource usage snapshot is reset at each batch. Back-patch to all supported branches. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
2020-02-19Stop demanding that top xact must be seen before subxact in decoding.Amit Kapila
Manifested as ERROR: subtransaction logged without previous top-level txn record this check forbids legit behaviours like - First xl_xact_assignment record is beyond reading, i.e. earlier restart_lsn. - After restart_lsn there is some change of a subxact. - After that, there is second xl_xact_assignment (for another subxact) revealing the relationship between top and first subxact. Such a transaction won't be streamed anyway because we hadn't seen it in full. Saying for sure whether xact of some record encountered after the snapshot was deserialized can be streamed or not requires to know whether it wrote something before deserialization point --if yes, it hasn't been seen in full and can't be decoded. Snapshot doesn't have such info, so there is no easy way to relax the check. Reported-by: Hsu, John Diagnosed-by: Arseny Sher Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar Backpatch-through: 9.5 Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
2020-02-17Teach pg_dump to dump comments on RLS policy objects.Tom Lane
This was unaccountably omitted in the original RLS patch. The SQL syntax is basically the same as for comments on triggers, so crib code from dumpTrigger(). Per report from Marc Munro. Back-patch to all supported branches. Discussion: https://postgr.es/m/1581889298.18009.15.camel@bloodnok.com
2020-02-11Document the pg_upgrade -j/--jobs option as taking an argumentPeter Eisentraut
2020-02-10Stamp 9.6.17.REL9_6_17Tom Lane
2020-02-10createuser: fix parsing of --connection-limit argumentAlvaro Herrera
The original coding failed to quote the argument properly. Reported-by: Daniel Gustafsson Discussion: 1B8AE66C-85AB-4728-9BB4-612E8E61C219@yesql.se
2020-02-10Fix priv checks for ALTER <object> DEPENDS ON EXTENSIONAlvaro Herrera
Marking an object as dependant on an extension did not have any privilege check whatsoever; this allowed any user to mark objects as droppable by anyone able to DROP EXTENSION, which could be used to cause system-wide havoc. Disallow by checking that the calling user owns the mentioned object. (No constraints are placed on the extension.) Security: CVE-2020-1720 Reported-by: Tom Lane Discussion: 31605.1566429043@sss.pgh.pa.us