summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2018-09-28Fix WAL recycling on standbys depending on archive_modeMichael Paquier
A restart point or a checkpoint recycling WAL segments treats segments marked with neither ".done" (archiving is done) or ".ready" (segment is ready to be archived) in archive_status the same way for archive_mode being "on" or "always". While for a primary this is fine, a standby running a restart point with archive_mode = on would try to mark such a segment as ready for archiving, which is something that will never happen except after the standby is promoted. Note that this problem applies only to WAL segments coming from the local pg_wal the first time archive recovery is run. Segments part of a self-contained base backup are the most common case where this could happen, however even in this case normally the .done markers would be most likely part of the backup. Segments recovered from an archive are marked as .ready or .done by the startup process, and segments finished streaming are marked as such by the WAL receiver, so they are handled already. Reported-by: Haruka Takatsuka Author: Michael Paquier Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org Backpatch-through: 9.5, where archive_mode = always has been added.
2018-09-26Recurse to sequences on ownership change for all relkindsPeter Eisentraut
When a table ownership is changed, we must apply that also to any owned sequences. (Otherwise, it would result in a situation that cannot be restored, because linked sequences must have the same owner as the table.) But this was previously only applied to regular tables and materialized views. But it should also apply to at least foreign tables. This patch removes the relkind check altogether, because it doesn't save very much and just introduces the possibility of similar omissions. Bug: #15238 Reported-by: Christoph Berg <christoph.berg@credativ.de>
2018-09-26Rework activation of commit timestamps during recoveryMichael Paquier
The activation and deactivation of commit timestamp tracking has not been handled consistently for a primary or standbys at recovery. The facility can be activated at three different moments of recovery: - The beginning, where a primary would use the GUC value for the decision-making, and where a standby relies on the contents of the control file. - When replaying a XLOG_PARAMETER_CHANGE record at redo. - The end, where both primary and standby rely on the GUC value. Using the GUC value for a primary at the beginning of recovery causes problems with commit timestamp access when doing crash recovery. Particularly, when replaying transaction commits, it could be possible that an attempt to read commit timestamps is done for a transaction which committed at a moment when track_commit_timestamp was disabled. A test case is added to reproduce the failure. The test works down to v11 as it takes advantage of transaction commits within procedures. Reported-by: Hailong Li Author: Masahiko Sawasa, Michael Paquier Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/11224478-a782-203b-1f17-e4797b39bdf0@qunar.com Backpatch-through: 9.5, where commit timestamps have been introduced.
2018-09-25Make some fixes to allow building Postgres on macOS 10.14 ("Mojave").Tom Lane
Apple's latest rearrangements of the system-supplied headers have broken building of PL/Perl and PL/Tcl. The only practical way to fix PL/Tcl is to start using the "-isysroot" compiler flag to point to SDK-supplied headers, as Apple expects. We must also start distinguishing where to find Perl's headers from where to find its shared library; but that seems like good cleanup anyway. Extensions that formerly did something like -I$(perl_archlibexp)/CORE should now do -I$(perl_includedir)/CORE instead. perl_archlibexp is still the place to look for libperl.so, though. If for some reason you don't like the default -isysroot setting, you can override that by setting PG_SYSROOT in configure's arguments. I don't currently think people would need to do so, unless maybe for cross-version build purposes. In addition, teach configure where to find tclConfig.sh. Our traditional method of searching $auto_path hasn't worked for the last couple of macOS releases, and it now seems clear that Apple's not going to change that. The workaround of manually specifying --with-tclconfig was annoying already, but Mojave's made it a lot more so because the sysroot path now has to be included as well. Let's just wire the knowledge into configure instead. To avoid breaking builds against non-default Tcl installations (e.g. MacPorts) wherein the $auto_path method probably still works, arrange to try the additional case only after all else has failed. Back-patch to all supported versions, since at least the buildfarm cares about that. The changes are set up to not do anything on macOS releases that are old enough to not have functional sysroot trees.
2018-09-24Fix over-allocation of space for array_out()'s result string.Tom Lane
array_out overestimated the space needed for its output, possibly by a very substantial amount if the array is multi-dimensional, because of wrong order of operations in the loop that counts the number of curly-brace pairs needed. While the output string is normally short-lived, this could still cause problems in extreme cases. An additional minor error was that it counted one more delimiter than is actually needed. Repair those errors, add an Assert that the space is now correctly calculated, and make some minor improvements in the comments. I also failed to resist the temptation to get rid of an integer modulus operation per array element; a simple comparison is sufficient. This bug dates clear back to Berkeley days, so back-patch to all supported versions. Keiichi Hirobe, minor additional work by me Discussion: https://postgr.es/m/CAH=EFxE9W0tRvQkixR2XJRRCToUYUEDkJZk6tnADXugPBRdcdg@mail.gmail.com
2018-09-23Initialize random() in bootstrap/stand-alone postgres and in initdb.Noah Misch
This removes a difference between the standard IsUnderPostmaster execution environment and that of --boot and --single. In a stand-alone backend, "SELECT random()" always started at the same seed. On a system capable of using posix shared memory, initdb could still conclude "selecting dynamic shared memory implementation ... sysv". Crashed --boot or --single postgres processes orphaned shared memory objects having names that collided with the not-actually-random names that initdb probed. The sysv fallback appeared after ten crashes of --boot or --single postgres. Since --boot and --single are rare in production use, systems used for PostgreSQL development are the principal candidate to notice this symptom. Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced dynamic shared memory, but 9.3 does share the "SELECT random()" problem. Reviewed by Tom Lane and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com
2018-09-23Fix failure in WHERE CURRENT OF after rewinding the referenced cursor.Tom Lane
In a case where we have multiple relation-scan nodes in a cursor plan, such as a scan of an inheritance tree, it's possible to fetch from a given scan node, then rewind the cursor and fetch some row from an earlier scan node. In such a case, execCurrent.c mistakenly thought that the later scan node was still active, because ExecReScan hadn't done anything to make it look not-active. We'd get some sort of failure in the case of a SeqScan node, because the node's scan tuple slot would be pointing at a HeapTuple whose t_self gets reset to invalid by heapam.c. But it seems possible that for other relation scan node types we'd actually return a valid tuple TID to the caller, resulting in updating or deleting a tuple that shouldn't have been considered current. To fix, forcibly clear the ScanTupleSlot in ExecScanReScan. Another issue here, which seems only latent at the moment but could easily become a live bug in future, is that rewinding a cursor does not necessarily lead to *immediately* applying ExecReScan to every scan-level node in the plan tree. Upper-level nodes will think that they can postpone that call if their child node is already marked with chgParam flags. I don't see a way for that to happen today in a plan tree that's simple enough for execCurrent.c's search_plan_tree to understand, but that's one heck of a fragile assumption. So, add some logic in search_plan_tree to detect chgParam flags being set on nodes that it descended to/through, and assume that that means we should consider lower scan nodes to be logically reset even if their ReScan call hasn't actually happened yet. Per bug #15395 from Matvey Arye. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/153764171023.14986.280404050547008575@wrigleys.postgresql.org
2018-09-20Defer restoration of libraries in parallel workers.Thomas Munro
Several users of extensions complained of crashes in parallel workers that turned out to be due to syscache access from their _PG_init() functions. Reorder the initialization of parallel workers so that libraries are restored after the caches are initialized, and inside a transaction. This was reported in bug #15350 and elsewhere. We don't consider it to be a bug: extensions shouldn't do that, because then they can't be used in shared_preload_libraries. However, it's a fairly obscure hazard and these extensions worked in practice before parallel query came along. So let's make it work. Later commits might add a warning message and eventually an error. Back-patch to 9.6, where parallel query landed. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Kieran McCusker, Jimmy Discussion: https://postgr.es/m/153512195228.1489.8545997741965926448%40wrigleys.postgresql.org
2018-09-19Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock.Tom Lane
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock that checked the return value of LockAcquireExtended. That created a bug, because it's still passing reportMemoryError = false to LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned if we're out of shared memory for the lock table. In such a situation, the startup process would believe it had acquired an exclusive lock even though it hadn't, with potentially dire consequences. To fix, just drop the use of reportMemoryError = false, which allows us to simplify the call into a plain LockAcquire(). It's unclear that the locktable-full situation arises often enough that it's worth having a better recovery method than crash-and-restart. (I strongly suspect that the only reason the code path existed at all was that it was relatively simple to do in the pre-37c54863c implementation. But now it's not.) LockAcquireExtended's reportMemoryError parameter is now dead code and could be removed. I refrained from doing so, however, because there was some interest in resurrecting the behavior if we do get reports of locktable-full failures in the field. Also, it seems unwise to remove the parameter concurrently with shipping commit f868a8143, which added a parameter; if there are any third-party callers of LockAcquireExtended, we want them to get a wrong-number-of-parameters compile error rather than a possibly-silent misinterpretation of its last parameter. Back-patch to 9.6 where the bug was introduced. Discussion: https://postgr.es/m/6202.1536359835@sss.pgh.pa.us
2018-09-18Allow DSM allocation to be interrupted.Thomas Munro
Chris Travers reported that the startup process can repeatedly try to cancel a backend that is in a posix_fallocate()/EINTR loop and cause it to loop forever. Teach the retry loop to give up if an interrupt is pending. Don't actually check for interrupts in that loop though, because a non-local exit would skip some clean-up code in the caller. Back-patch to 9.4 where DSM was added (and posix_fallocate() was later back-patched). Author: Chris Travers Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin Tested-by: Oleksii Kliukin Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com
2018-09-15Fix failure with initplans used conditionally during EvalPlanQual rechecks.Tom Lane
The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit 9f2ee8f28, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
2018-09-14Don't allow LIMIT/OFFSET clause within sub-selects to be pushed to workers.Amit Kapila
Allowing sub-select containing LIMIT/OFFSET in workers can lead to inconsistent results at the top-level as there is no guarantee that the row order will be fully deterministic. The fix is to prohibit pushing LIMIT/OFFSET within sub-selects to workers. Reported-by: Andrew Fletcher Bug: 15324 Author: Amit Kapila Reviewed-by: Dilip Kumar Backpatch-through: 9.6 Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
2018-09-14Back-patch "Fix parallel hash join path search."Amit Kapila
Back-patch commit 655393a022bd653e2b48dbf20b69236981e35195 to 9.6. This synchronizes the relavant code and helps in generating parallel paths in some cases in 9.6. This also helps in back-patch of future patches where we can get the same plan in all branches. Discussion: https://postgr.es/m/CAA4eK1+XK_875cJA1HPVpx9C7C8Fp7i4QzLJ17T3igfU2iadxQ@mail.gmail.com
2018-09-13Attach FPI to the first record after full_page_writes is turned on.Amit Kapila
XLogInsert fails to attach a required FPI to the first record after full_page_writes is turned on by the last checkpoint. This bug got introduced in 9.5 due to code rearrangement in commits 2c03216d83 and 2076db2aea. Fix it by ensuring that XLogInsertRecord performs a recomputation when the given record is generated with FPW as off but found that the flag has been turned on while actually inserting the record. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Amit Kapila Backpatch-through: 9.5 where this problem was introduced Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp
2018-09-12Repair bug in regexp split performance improvements.Andrew Gierth
Commit c8ea87e4b introduced a temporary conversion buffer for substrings extracted during regexp splits. Unfortunately the code that sized it was failing to ignore the effects of ignored degenerate regexp matches, so for regexp_split_* calls it could under-size the buffer in such cases. Fix, and add some regression test cases (though those will only catch the bug if run in a multibyte encoding). Backpatch to 9.3 as the faulty code was. Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the report (via IRC) and assistance in analysis. Patch by me.
2018-09-11Repair double-free in SP-GIST rescan (bug #15378)Andrew Gierth
spgrescan would first reset traversalCxt, and then traverse a potentially non-empty stack containing pointers to traversalValues which had been allocated in those contexts, freeing them a second time. This bug originates in commit ccd6eb49a where traversalValue was introduced. Repair by traversing the stack before the context reset; this isn't ideal, since it means doing retail pfree in a context that's about to be reset, but the freeing of a stack entry is also done in other places in the code during the scan so it's not worth trying to refactor it further. Regression test added. Backpatch to 9.6 where the problem was introduced. Per bug #15378; analysis and patch by me, originally from a report on IRC by user velix; see also PostGIS ticket #4174; review by Alexander Korotkov. Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org
2018-09-09Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov
ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
2018-09-07Save/restore SPI's global variables in SPI_connect() and SPI_finish().Tom Lane
This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
2018-09-07Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.Tom Lane
It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07Fix longstanding recursion hazard in sinval message processing.Tom Lane
LockRelationOid and sibling routines supposed that, if our session already holds the lock they were asked to acquire, they could skip calling AcceptInvalidationMessages on the grounds that we must have already read any remote sinval messages issued against the relation being locked. This is normally true, but there's a critical special case where it's not: processing inside AcceptInvalidationMessages might attempt to access system relations, resulting in a recursive call to acquire a relation lock. Hence, if the outer call had acquired that same system catalog lock, we'd fall through, despite the possibility that there's an as-yet-unread sinval message for that system catalog. This could, for example, result in failure to access a system catalog or index that had just been processed by VACUUM FULL. This is the explanation for buildfarm failures we've been seeing intermittently for the past three months. The bug is far older than that, but commits a54e1f158 et al added a new recursion case within AcceptInvalidationMessages that is apparently easier to hit than any previous case. To fix this, we must not skip calling AcceptInvalidationMessages until we have *finished* a call to it since acquiring a relation lock, not merely acquired the lock. (There's already adequate logic inside AcceptInvalidationMessages to deal with being called recursively.) Fortunately, we can implement that at trivial cost, by adding a flag to LOCALLOCK hashtable entries that tracks whether we know we have completed such a call. There is an API hazard added by this patch for external callers of LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD, it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR into thinking the lock wasn't already held. This should be a fail-soft condition, though, unless something very bizarre is being done in response to the test. Also, I added an additional output argument to LockAcquireExtended, assuming that that probably isn't called by any outside code given the very limited usefulness of its additional functionality. Back-patch to all supported branches. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-04Prohibit pushing subqueries containing window function calculation toAmit Kapila
workers. Allowing window function calculation in workers leads to inconsistent results because if the input row ordering is not fully deterministic, the output of window functions might vary across workers. The fix is to treat them as parallel-restricted. In the passing, improve the coding pattern in max_parallel_hazard_walker so that it has a chain of mutually-exclusive if ... else if ... else if ... else if ... IsA tests. Reported-by: Marko Tiikkaja Bug: 15324 Author: Amit Kapila Reviewed-by: Tom Lane Backpatch-through: 9.6 Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com
2018-09-02Fix initial sync of slot parent directory when restoring statusMichael Paquier
At the beginning of recovery, information from replication slots is recovered from disk to memory. In order to ensure the durability of the information, the status file as well as its parent directory are synced. It happens that the sync on the parent directory was done directly using the status file path, which is logically incorrect, and the current code has been doing a sync on the same object twice in a row. Reported-by: Konstantin Knizhnik Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru Backpatch-through: 9.4-
2018-09-01Avoid using potentially-under-aligned page buffers.Tom Lane
There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-08-31Ignore server-side delays when enforcing wal_sender_timeout.Noah Misch
Healthy clients of servers having poor I/O performance, such as buildfarm members hamster and tern, saw unexpected timeouts. That disagreed with documentation. This fix adds one gettimeofday() call whenever ProcessRepliesIfAny() finds no client reply messages. Back-patch to 9.4; the bug's symptom is rare and mild, and the code all moved between 9.3 and 9.4. Discussion: https://postgr.es/m/20180826034600.GA1105084@rfd.leadboat.com
2018-08-31Ensure correct minimum consistent point on standbysMichael Paquier
Startup process has improved its calculation of incorrect minimum consistent point in 8d68ee6, which ensures that all WAL available gets replayed when doing crash recovery, and has introduced an incorrect calculation of the minimum recovery point for non-startup processes, which can cause incorrect page references on a standby when for example the background writer flushed a couple of pages on-disk but was not updating the control file to let a subsequent crash recovery replay to where it should have. The only case where this has been reported to be a problem is when a standby needs to calculate the latest removed xid when replaying a btree deletion record, so one would need connections on a standby that happen just after recovery has thought it reached a consistent point. Using a background worker which is started after the consistent point is reached would be the easiest way to get into problems if it connects to a database. Having clients which attempt to connect periodically could also be a problem, but the odds of seeing this problem are much lower. The fix used is pretty simple, as the idea is to give access to the minimum recovery point written in the control file to non-startup processes so as they use a reference, while the startup process still initializes its own references of the minimum consistent point so as the original problem with incorrect page references happening post-promotion with a crash do not show up. Reported-by: Alexander Kukushkin Diagnosed-by: Alexander Kukushkin Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org Backpatch-through: 9.3
2018-08-31Make checksum_impl.h safe to compile with -fstrict-aliasing.Tom Lane
In general, Postgres requires -fno-strict-aliasing with compilers that implement C99 strict aliasing rules. There's little hope of getting rid of that overall. But it seems like it would be a good idea if storage/checksum_impl.h in particular didn't depend on it, because that header is explicitly intended to be included by external programs. We don't have a lot of control over the compiler switches that an external program might use, as shown by Michael Banck's report of failure in a privately-modified version of pg_verify_checksums. Hence, switch to using a union in place of willy-nilly pointer casting inside this file. I think this makes the code a bit more readable anyway. checksum_impl.h hasn't changed since it was introduced in 9.3, so back-patch all the way. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-08-29Stop bgworkers during fast shutdown with postmaster in startup phaseMichael Paquier
When a postmaster gets into its phase PM_STARTUP, it would start background workers using BgWorkerStart_PostmasterStart mode immediately, which would cause problems for a fast shutdown as the postmaster forgets to send SIGTERM to already-started background workers. With smart and immediate shutdowns, this correctly happened, and fast shutdown is the only mode missing the shot. Author: Alexander Kukushkin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com Backpatch-through: 9.5
2018-08-28Make pg_restore's identify_locking_dependencies() more bulletproof.Tom Lane
This function had a blacklist of dump object types that it believed needed exclusive lock ... but we hadn't maintained that, so that it was missing ROW SECURITY, POLICY, and INDEX ATTACH items, all of which need (or should be treated as needing) exclusive lock. Since the same oversight seems likely in future, let's reverse the sense of the test so that the code has a whitelist of safe object types; better to wrongly assume a command can't be run in parallel than the opposite. Currently the only POST_DATA object type that's safe is CREATE INDEX ... and that list hasn't changed in a long time. Back-patch to 9.5 where RLS came in. Discussion: https://postgr.es/m/11450.1535483506@sss.pgh.pa.us
2018-08-28Avoid quadratic slowdown in regexp match/split functions.Andrew Gierth
regexp_matches, regexp_split_to_table and regexp_split_to_array all work by compiling a list of match positions as character offsets (NOT byte positions) in the source string. Formerly, they then used text_substr to extract the matched text; but in a multi-byte encoding, that counts the characters in the string, and the characters needed to reach the starting byte position, on every call. Accordingly, the performance degraded as the product of the input string length and the number of match positions, such that splitting a string of a few hundred kbytes could take many minutes. Repair by keeping the wide-character copy of the input string available (only in the case where encoding_max_length is not 1) after performing the match operation, and extracting substrings from that instead. This reduces the complexity to being linear in the number of result bytes, discounting the actual regexp match itself (which is not affected by this patch). In passing, remove cleanup using retail pfree() which was obsoleted by commit ff428cded (Feb 2008) which made cleanup of SRF multi-call contexts automatic. Also increase (to ~134 million) the maximum number of matches and provide an error message when it is reached. Backpatch all the way because this has been wrong forever. Analysis and patch by me; review by Kaiting Chen. Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk
2018-08-27Fix missing dependency for pg_dump's ENABLE ROW LEVEL SECURITY items.Tom Lane
The archive should show a dependency on the item's table, but it failed to include one. This could cause failures in parallel restore due to emitting ALTER TABLE ... ENABLE ROW LEVEL SECURITY before restoring the table's data. In practice the odds of a problem seem low, since you would typically need to have set FORCE ROW LEVEL SECURITY as well, and you'd also need a very high --jobs count to have any chance of this happening. That probably explains the lack of field reports. Still, it's a bug, so back-patch to 9.5 where RLS was introduced. Discussion: https://postgr.es/m/19784.1535390902@sss.pgh.pa.us
2018-08-26Make syslogger more robust against failures in opening CSV log files.Tom Lane
The previous coding figured it'd be good enough to postpone opening the first CSV log file until we got a message we needed to write there. This is unsafe, though, because if the open fails we end up in infinite recursion trying to report the failure. Instead make the CSV log file management code look as nearly as possible like the longstanding logic for the stderr log file. In particular, open it immediately at postmaster startup (if enabled), or when we get a SIGHUP in which we find that log_destination has been changed to enable CSV logging. It seems OK to fail if a postmaster-start-time open attempt fails, as we've long done for the stderr log file. But we can't die if we fail to open a CSV log file during SIGHUP, so we're still left with a problem. In that case, write any output meant for the CSV log file to the stderr log file. (This will also cover race-condition cases in which backends send CSV log data before or after we have the CSV log file open.) This patch also fixes an ancient oversight that, if CSV logging was turned off during a SIGHUP, we never actually closed the last CSV log file. In passing, remember to reset whereToSendOutput = DestNone during syslogger start, since (unlike all other postmaster children) it's forked before the postmaster has done that. This made for a platform-dependent difference in error reporting behavior between the syslogger and other children: except on Windows, it'd report problems to the original postmaster stderr as well as the normal error log file(s). It's barely possible that that was intentional at some point; but it doesn't seem likely to be desirable in production, and the platform dependency definitely isn't desirable. Per report from Alexander Kukushkin. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com
2018-08-23Fix lexing of standard multi-character operators in edge cases.Andrew Gierth
Commits c6b3c939b (which fixed the precedence of >=, <=, <> operators) and 865f14a2d (which added support for the standard => notation for named arguments) created a class of lexer tokens which look like multi-character operators but which have their own token IDs distinct from Op. However, longest-match rules meant that following any of these tokens with another operator character, as in (1<>-1), would cause them to be incorrectly returned as Op. The error here isn't immediately obvious, because the parser would usually still find the correct operator via the Op token, but there were more subtle problems: 1. If immediately followed by a comment or +-, >= <= <> would be given the old precedence of Op rather than the correct new precedence; 2. If followed by a comment, != would be returned as Op rather than as NOT_EQUAL, causing it not to be found at all; 3. If followed by a comment or +-, the => token for named arguments would be lexed as Op, causing the argument to be mis-parsed as a simple expression, usually causing an error. Fix by explicitly checking for the operators in the {operator} code block in addition to all the existing special cases there. Backpatch to 9.5 where the problem was introduced. Analysis and patch by me; review by Tom Lane. Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk
2018-08-23Reduce an unnecessary O(N^3) loop in lexer.Andrew Gierth
The lexer's handling of operators contained an O(N^3) hazard when dealing with long strings of + or - characters; it seems hard to prevent this case from being O(N^2), but the additional N multiplier was not needed. Backpatch all the way since this has been there since 7.x, and it presents at least a mild hazard in that trying to do Bind, PREPARE or EXPLAIN on a hostile query could take excessive time (without honouring cancels or timeouts) even if the query was never executed.
2018-08-21Fix set of NLS translation issuesMichael Paquier
While monitoring the code, a couple of issues related to string translation has showed up: - Some routines for auto-updatable views return an error string, which sometimes missed the shot. A comment regarding string translation is added for each routine to help with future features. - GSSAPI authentication missed two translations. - vacuumdb handles non-translated strings. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp Backpatch-through: 9.3
2018-08-17Ensure schema qualification in pg_restore DISABLE/ENABLE TRIGGER commands.Tom Lane
Previously, this code blindly followed the common coding pattern of passing PQserverVersion(AH->connection) as the server-version parameter of fmtQualifiedId. That works as long as we have a connection; but in pg_restore with text output, we don't. Instead we got a zero from PQserverVersion, which fmtQualifiedId interpreted as "server is too old to have schemas", and so the name went unqualified. That still accidentally managed to work in many cases, which is probably why this ancient bug went undetected for so long. It only became obvious in the wake of the changes to force dump/restore to execute with restricted search_path. In HEAD/v11, let's deal with this by ripping out fmtQualifiedId's server- version behavioral dependency, and just making it schema-qualify all the time. We no longer support pg_dump from servers old enough to need the ability to omit schema name, let alone restoring to them. (Also, the few callers outside pg_dump already didn't work with pre-schema servers.) In older branches, that's not an acceptable solution, so instead just tweak the DISABLE/ENABLE TRIGGER logic to ensure it will schema-qualify its output regardless of server version. Per bug #15338 from Oleg somebody. Back-patch to all supported branches. Discussion: https://postgr.es/m/153452458706.1316.5328079417086507743@wrigleys.postgresql.org
2018-08-17Set scan direction appropriately for SubPlans (bug #15336)Andrew Gierth
When executing a SubPlan in an expression, the EState's direction field was left alone, resulting in an attempt to execute the subplan backwards if it was encountered during a backwards scan of a cursor. Also, though much less likely, it was possible to reach the execution of an InitPlan while in backwards-scan state. Repair by saving/restoring estate->es_direction and forcing forward scan mode in the relevant places. Backpatch all the way, since this has been broken since 8.3 (prior to commit c7ff7663e, SubPlans had their own EStates rather than sharing the parent plan's, so there was no confusion over scan direction). Per bug #15336 reported by Vladimir Baranoff; analysis and patch by me, review by Tom Lane. Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org
2018-08-17pg_upgrade: issue helpful error message for use on standbysBruce Momjian
Commit 777e6ddf1723306bd2bf8fe6f804863f459b0323 checked for a shut down message from a standby and allowed it to continue. This patch reports a helpful error message in these cases, suggesting to use rsync as documented. Diagnosed-by: Martín Marqués Discussion: https://postgr.es/m/CAPdiE1xYCow-reLjrhJ9DqrMu-ppNq0ChUUEvVdxhdjGRD5_eA@mail.gmail.com Backpatch-through: 9.3
2018-08-16Close the file descriptor in ApplyLogicalMappingFileTomas Vondra
The function was forgetting to close the file descriptor, resulting in failures like this: ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b" LOCATION: OpenTransientFile, fd.c:2161 Simply close the file at the end, and backpatch to 9.4 (where logical decoding was introduced). While at it, fix a nearby typo. Discussion: https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com
2018-08-15Make snprintf.c follow the C99 standard for snprintf's result value.Tom Lane
C99 says that the result should be the number of bytes that would have been emitted given a large enough buffer, not the number we actually were able to put in the buffer. It's time to make our substitute implementation comply with that. Not doing so results in inefficiency in buffer-enlargement cases, and also poses a portability hazard for third-party code that might expect C99-compliant snprintf behavior within Postgres. In passing, remove useless tests for str == NULL; neither C99 nor predecessor standards ever allowed that except when count == 0, so I see no reason to expend cycles on making that a non-crash case for this implementation. Also, don't waste a byte in pg_vfprintf's local I/O buffer; this might have performance benefits by allowing aligned writes during flushbuffer calls. Back-patch of commit 805889d7d. There was some concern about this possibly breaking code that assumes pre-C99 behavior, but there is much more risk (and reality, in our own code) of code that assumes C99 behavior and hence fails to detect buffer overrun without this. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-15Update FSM on WAL replay of page all-visible/frozenAlvaro Herrera
We aren't very strict about keeping FSM up to date on WAL replay, because per-page freespace values aren't critical in replicas (can't write to heap in a replica; and if the replica is promoted, the values would be updated by VACUUM anyway). However, VACUUM since 9.6 can skip processing pages marked all-visible or all-frozen, and if such pages are recorded in FSM with wrong values, those values are blindly propagated to FSM's upper layers by VACUUM's FreeSpaceMapVacuum. (This rationale assumes that crashes are not very frequent, because those would cause outdated FSM to occur in the primary.) Even when the FSM is outdated in standby, things are not too bad normally, because, most per-page FSM values will be zero (other than those propagated with the base-backup that created the standby); only once the remaining free space is less than 0.2*BLCKSZ the per-page value is maintained by WAL replay of heap ins/upd/del. However, if wal_log_hints=on causes complete FSM pages to be propagated to a standby via full-page images, many too-optimistic per-page values can end up being registered in the standby. Incorrect per-page values aren't critical in most cases, since an inserter that is given a page that doesn't actually contain the claimed free space will update FSM with the correct value, and retry until it finds a usable page. However, if there are many such updates to do, an inserter can spend a long time doing them before a usable page is found; in a heavily trafficked insert-only table with many concurrent inserters this has been observed to cause several second stalls, causing visible application malfunction. To fix this problem, it seems sufficient to have heap_xlog_visible (replay of setting all-visible and all-frozen VM bits for a heap page) update the FSM value for the page being processed. This fixes the per-page counters together with making the page skippable to vacuum, so when vacuum does FreeSpaceMapVacuum, the values propagated to FSM upper layers are the correct ones, avoiding the problem. While at it, apply the same fix to heap_xlog_clean (replay of tuple removal by HOT pruning and vacuum). This makes any space freed by the cleaning available earlier than the next vacuum in the promoted replica. Backpatch to 9.6, where this problem was diagnosed on an insert-only table with all-frozen pages, which were introduced as a concept in that release. Theoretically it could apply with all-visible pages to older branches, but there's been no report of that and it doesn't backpatch cleanly anyway. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20180802172857.5skoexsilnjvgruk@alvherre.pgsql
2018-08-15Clean up assorted misuses of snprintf()'s result value.Tom Lane
Fix a small number of places that were testing the result of snprintf() but doing so incorrectly. The right test for buffer overrun, per C99, is "result >= bufsize" not "result > bufsize". Some places were also checking for failure with "result == -1", but the standard only says that a negative value is delivered on failure. (Note that this only makes these places correct if snprintf() delivers C99-compliant results. But at least now these places are consistent with all the other places where we assume that.) Also, make psql_start_test() and isolation_start_test() check for buffer overrun while constructing their shell commands. There seems like a higher risk of overrun, with more severe consequences, here than there is for the individual file paths that are made elsewhere in the same functions, so this seemed like a worthwhile change. Also fix guc.c's do_serialize() to initialize errno = 0 before calling vsnprintf. In principle, this should be unnecessary because vsnprintf should have set errno if it returns a failure indication ... but the other two places this coding pattern is cribbed from don't assume that, so let's be consistent. These errors are all very old, so back-patch as appropriate. I think that only the shell command overrun cases are even theoretically reachable in practice, but there's not much point in erroneous error checks. Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-14pg_upgrade: fix shutdown check for standby serversBruce Momjian
Commit 244142d32afd02e7408a2ef1f249b00393983822 only tested for the pg_controldata output for primary servers, but standby servers have different "Database cluster state" output, so check for that too. Diagnosed-by: Michael Paquier Discussion: https://postgr.es/m/20180810164240.GM13638@paquier.xyz Backpatch-through: 9.3
2018-08-13Prohibit shutting down resources if there is a possibility of back up.Amit Kapila
Currently, we release the asynchronous resources as soon as it is evident that no more rows will be needed e.g. when a Limit is filled. This can be problematic especially for custom and foreign scans where we can scan backward. Fix that by disallowing the shutting down of resources in such cases. Reported-by: Robert Haas Analysed-by: Robert Haas and Amit Kapila Author: Amit Kapila Reviewed-by: Robert Haas Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
2018-08-08Don't run atexit callbacks in quickdie signal handlers.Heikki Linnakangas
exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
2018-08-07Don't record FDW user mappings as members of extensions.Tom Lane
CreateUserMapping has a recordDependencyOnCurrentExtension call that's been there since extensions were introduced (very possibly my fault). However, there's no support anywhere else for user mappings as members of extensions, nor are they listed as a possible member object type in the documentation. Nor does it really seem like a good idea for user mappings to belong to extensions when roles don't. Hence, remove the bogus call. (As we saw in bug #15310, the lack of any pg_dump support for this case ensures that any such membership record would silently disappear during pg_upgrade. So there's probably no need for us to do anything else about cleaning up after this mistake.) Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us
2018-08-07Fix incorrect initialization of BackendActivityBuffer.Tom Lane
Since commit c8e8b5a6e, this has been zeroed out using the wrong length. In practice the length would always be too small, leading to not zeroing the whole buffer rather than clobbering additional memory; and that's pretty harmless, both because shmem would likely start out as zeroes and because we'd reinitialize any given entry before use. Still, it's bogus, so fix it. Reported by Petru-Florin Mihancea (bug #15312) Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org
2018-08-07Fix pg_upgrade to handle event triggers in extensions correctly.Tom Lane
pg_dump with --binary-upgrade must emit ALTER EXTENSION ADD commands for all objects that are members of extensions. It forgot to do so for event triggers, as per bug #15310 from Nick Barnes. Back-patch to 9.3 where event triggers were introduced. Haribabu Kommi Discussion: https://postgr.es/m/153360083872.1395.4593932457718151600@wrigleys.postgresql.org
2018-08-07Ensure pg_dump_sort.c sorts null vs non-null namespace consistently.Tom Lane
The original coding here (which is, I believe, my fault) supposed that it didn't need to concern itself with the possibility that one object of a given type-priority has a namespace while another doesn't. But that's not reliably true anymore, if it ever was; and if it does happen then it's possible that DOTypeNameCompare returns self-inconsistent comparison results. That leads to unspecified behavior in qsort() and a resultant weird output order from pg_dump. This should end up being only a cosmetic problem, because any ordering constraints that actually matter should be enforced by the later dependency-based sort. Still, it's a bug, so back-patch. Report and fix by Jacob Champion, though I editorialized on his patch to the extent of making NULL sort after non-NULL, for consistency with our usual sorting definitions. Discussion: https://postgr.es/m/CABAq_6Hw+V-Kj7PNfD5tgOaWT_-qaYkc+SRmJkPLeUjYXLdxwQ@mail.gmail.com
2018-08-06Stamp 9.6.10.REL9_6_10Tom Lane
2018-08-06Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 439efc0cb9779c62ae569c397267d7bc55c6ef4f