summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2018-09-24Make EXPLAIN output for JIT compilation more dense.Andres Freund
A discussion about also reporting JIT compilation overhead on workers brought unhappiness with the verbosity of the current explain format to light. Make the text format more dense, and restructure the structured output to mirror that more closely. As we're re-jiggering the output format anyway: The denser format allows us to report all flags for JIT compilation (now also reporting PGJIT_EXPR and PGJIT_DEFORM), and report the total time in addition to the individual times. Per complaint from Tom Lane. Author: Andres Freund Discussion: https://postgr.es/m/27812.1537221015@sss.pgh.pa.us Backpatch: 11-, where JIT compilation was introduced
2018-09-24Fast default trigger and expand_tuple fixesAndrew Dunstan
Ensure that triggers get properly filled in tuples for the OLD value. Also fix the logic of detecting missing null values. The previous logic failed to detect a missing null column before the first missing column with a default. Fixing this has simplified the logic a bit. Regression tests are added to test changes. This should ensure better coverage of expand_tuple(). Original bug reports, and some code and test scripts from Tomas Vondra Backpatch to release 11.
2018-09-24Fix over-allocation of space for array_out()'s result string.Tom Lane
array_out overestimated the space needed for its output, possibly by a very substantial amount if the array is multi-dimensional, because of wrong order of operations in the loop that counts the number of curly-brace pairs needed. While the output string is normally short-lived, this could still cause problems in extreme cases. An additional minor error was that it counted one more delimiter than is actually needed. Repair those errors, add an Assert that the space is now correctly calculated, and make some minor improvements in the comments. I also failed to resist the temptation to get rid of an integer modulus operation per array element; a simple comparison is sufficient. This bug dates clear back to Berkeley days, so back-patch to all supported versions. Keiichi Hirobe, minor additional work by me Discussion: https://postgr.es/m/CAH=EFxE9W0tRvQkixR2XJRRCToUYUEDkJZk6tnADXugPBRdcdg@mail.gmail.com
2018-09-23Initialize random() in bootstrap/stand-alone postgres and in initdb.Noah Misch
This removes a difference between the standard IsUnderPostmaster execution environment and that of --boot and --single. In a stand-alone backend, "SELECT random()" always started at the same seed. On a system capable of using posix shared memory, initdb could still conclude "selecting dynamic shared memory implementation ... sysv". Crashed --boot or --single postgres processes orphaned shared memory objects having names that collided with the not-actually-random names that initdb probed. The sysv fallback appeared after ten crashes of --boot or --single postgres. Since --boot and --single are rare in production use, systems used for PostgreSQL development are the principal candidate to notice this symptom. Back-patch to 9.3 (all supported versions). PostgreSQL 9.4 introduced dynamic shared memory, but 9.3 does share the "SELECT random()" problem. Reviewed by Tom Lane and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com
2018-09-23Fix failure in WHERE CURRENT OF after rewinding the referenced cursor.Tom Lane
In a case where we have multiple relation-scan nodes in a cursor plan, such as a scan of an inheritance tree, it's possible to fetch from a given scan node, then rewind the cursor and fetch some row from an earlier scan node. In such a case, execCurrent.c mistakenly thought that the later scan node was still active, because ExecReScan hadn't done anything to make it look not-active. We'd get some sort of failure in the case of a SeqScan node, because the node's scan tuple slot would be pointing at a HeapTuple whose t_self gets reset to invalid by heapam.c. But it seems possible that for other relation scan node types we'd actually return a valid tuple TID to the caller, resulting in updating or deleting a tuple that shouldn't have been considered current. To fix, forcibly clear the ScanTupleSlot in ExecScanReScan. Another issue here, which seems only latent at the moment but could easily become a live bug in future, is that rewinding a cursor does not necessarily lead to *immediately* applying ExecReScan to every scan-level node in the plan tree. Upper-level nodes will think that they can postpone that call if their child node is already marked with chgParam flags. I don't see a way for that to happen today in a plan tree that's simple enough for execCurrent.c's search_plan_tree to understand, but that's one heck of a fragile assumption. So, add some logic in search_plan_tree to detect chgParam flags being set on nodes that it descended to/through, and assume that that means we should consider lower scan nodes to be logically reset even if their ReScan call hasn't actually happened yet. Per bug #15395 from Matvey Arye. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/153764171023.14986.280404050547008575@wrigleys.postgresql.org
2018-09-21Fix bogus tab-completion rule for CREATE PUBLICATION.Tom Lane
You can't use "FOR TABLE" as a single Matches argument, because readline will consider that input to be two words not one. It's necessary to make the pattern contain two arguments. The case accidentally worked anyway because the words_after_create test fired ... but only for the first such table name. Noted by Edmund Horner, though this isn't exactly his proposed fix. Backpatch to v10 where the faulty code came in. Discussion: https://postgr.es/m/CAMyN-kDe=gBmHgxWwUUaXuwK+p+7g1vChR7foPHRDLE592nJPQ@mail.gmail.com
2018-09-22Use size_t consistently in dsa.{ch}.Thomas Munro
Takeshi Ideriha complained that there is a mixture of Size and size_t in dsa.c and corresponding header. Let's use size_t. Back-patch to 10 where dsa.c landed, to make future back-patching easy. Discussion: https://postgr.es/m/4E72940DA2BF16479384A86D54D0988A6F19ABD9%40G01JPEXMBKW04
2018-09-20Fix segment_bins corruption in dsa.c.Thomas Munro
If a segment has been freed by dsa.c because it is entirely empty, other backends must make sure to unmap it before following links to new segments that might happen to have the same index number, or they could finish up looking at a defunct segment and then corrupt the segment_bins lists. The correct protocol requires checking freed_segment_counter after acquiring the area lock and before resolving any index number to a segment. Add the missing checks and an assertion. Back-patch to 10, where dsa.c first arrived. Author: Thomas Munro Reported-by: Tomas Vondra Discussion: https://postgr.es/m/CAEepm%3D0thg%2Bja5zGVa7jBy-uqyHrTqTm8HGhEOtMmigGrAqTbw%40mail.gmail.com
2018-09-20Defer restoration of libraries in parallel workers.Thomas Munro
Several users of extensions complained of crashes in parallel workers that turned out to be due to syscache access from their _PG_init() functions. Reorder the initialization of parallel workers so that libraries are restored after the caches are initialized, and inside a transaction. This was reported in bug #15350 and elsewhere. We don't consider it to be a bug: extensions shouldn't do that, because then they can't be used in shared_preload_libraries. However, it's a fairly obscure hazard and these extensions worked in practice before parallel query came along. So let's make it work. Later commits might add a warning message and eventually an error. Back-patch to 9.6, where parallel query landed. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Kieran McCusker, Jimmy Discussion: https://postgr.es/m/153512195228.1489.8545997741965926448%40wrigleys.postgresql.org
2018-09-19Fix minor error message style guide violation.Tom Lane
No periods at the ends of primary error messages, please. Daniel Gustafsson Discussion: https://postgr.es/m/43E004C0-18C6-42B4-A313-003B43EB0571@yesql.se
2018-09-19Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock.Tom Lane
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock that checked the return value of LockAcquireExtended. That created a bug, because it's still passing reportMemoryError = false to LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned if we're out of shared memory for the lock table. In such a situation, the startup process would believe it had acquired an exclusive lock even though it hadn't, with potentially dire consequences. To fix, just drop the use of reportMemoryError = false, which allows us to simplify the call into a plain LockAcquire(). It's unclear that the locktable-full situation arises often enough that it's worth having a better recovery method than crash-and-restart. (I strongly suspect that the only reason the code path existed at all was that it was relatively simple to do in the pre-37c54863c implementation. But now it's not.) LockAcquireExtended's reportMemoryError parameter is now dead code and could be removed. I refrained from doing so, however, because there was some interest in resurrecting the behavior if we do get reports of locktable-full failures in the field. Also, it seems unwise to remove the parameter concurrently with shipping commit f868a8143, which added a parameter; if there are any third-party callers of LockAcquireExtended, we want them to get a wrong-number-of-parameters compile error rather than a possibly-silent misinterpretation of its last parameter. Back-patch to 9.6 where the bug was introduced. Discussion: https://postgr.es/m/6202.1536359835@sss.pgh.pa.us
2018-09-18Revert "Allow concurrent-safe open() and fopen() in frontend code for Windows"REL_11_BETA4Tom Lane
This reverts commit f02259fe93e75d5443a2fabe2f2f38b81924ab36, in the v11 branch only. The hack this required in initdb.c should probably have clued us that it wasn't really ready, but we didn't get the hint. Subsequent developments have made clear that it affected text-vs-binary behavior in a lot of places, and there's no reason to think that any of those behavioral changes are desirable. There's no time to fix this before 11beta4, so just revert for the moment. We can keep working on this in HEAD, and maybe reconsider a back-patch once we're satisfied things are stable. (I take the blame for this fiasco, having encouraged Michael to back-patch a change at the last possible moment before beta wrap.)
2018-09-18Fix some probably-minor oversights in readfuncs.c.Tom Lane
The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't restore them. This doesn't cause obvious failures, because the only things that look at those fields are expandRTE() and get_rte_attribute_type(), which are mostly used during parse analysis, before anything would've passed the parsetree through outfuncs/readfuncs. But expandRTE() is used in build_physical_tlist(), which means that that function will return a wrong answer for a TABLEFUNC RTE that came from a view. Very accidentally, this doesn't cause serious problems, because what it will return is NIL which callers will interpret as "couldn't build a physical tlist because of dropped columns". So you still get a plan that works, though it's marginally less efficient than it could be. There are also some other expandRTE() calls associated with transformation of whole-row Vars in the planner. I have been unable to exhibit misbehavior from that, and it may be unreachable in any case that anyone would care about ... but I'm not entirely convinced, so this seems like something we should back- patch a fix for. Fortunately, we can fix it without forcing a change of stored rules and a catversion bump, because we can just copy these lists from the subsidiary TableFunc object. readfuncs.c was also missing support for NamedTuplestoreScan plan nodes. This accidentally fails to break parallel query because a query using a named tuplestore would never be considered parallel-safe anyway. However, project policy since parallel query came in is that all plan node types should have outfuncs/readfuncs support, so this is clearly an oversight that should be repaired. Noted while fooling around with a patch to test outfuncs/readfuncs more thoroughly. That exposed some other issues too, but these are the only ones that seem worth back-patching. Back-patch to v10 where both of these features came in. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18Allow DSM allocation to be interrupted.Thomas Munro
Chris Travers reported that the startup process can repeatedly try to cancel a backend that is in a posix_fallocate()/EINTR loop and cause it to loop forever. Teach the retry loop to give up if an interrupt is pending. Don't actually check for interrupts in that loop though, because a non-local exit would skip some clean-up code in the caller. Back-patch to 9.4 where DSM was added (and posix_fallocate() was later back-patched). Author: Chris Travers Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin Tested-by: Oleksii Kliukin Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com
2018-09-17Stamp 11beta4.Tom Lane
2018-09-17Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)).Tom Lane
The original coding for XMLTABLE thought it could represent a default namespace by a T_String Value node with a null string pointer. That's not okay, though; in particular outfuncs.c/readfuncs.c are not on board with such a representation, meaning you'll get a null pointer crash if you try to store a view or rule containing this construct. To fix, change the parsetree representation so that we have a NULL list element, instead of a bogus Value node. This isn't really a functional limitation since default XML namespaces aren't yet implemented in the executor; you'd just get "DEFAULT namespace is not supported" anyway. But crashes are not nice, so back-patch to v10 where this syntax was added. Ordinarily we'd consider a parsetree representation change to be un-backpatchable; but since existing releases would crash on the way to storing such constructs, there can't be any existing views/rules to be incompatible with. Per report from Andrey Lepikhov. Discussion: https://postgr.es/m/3690074f-abd2-56a9-144a-aa5545d7a291@postgrespro.ru
2018-09-17Fix pgbench lexer's "continuation" rule to cope with Windows newlines.Tom Lane
Our general practice in frontend code is to accept input with either Unix-style newlines (\n) or DOS-style (\r\n). pgbench was mostly down with that, but its rule for line continuations (backslash-newline) was not. This had been masked on Windows buildfarm machines before commit 0ba06e0bf by use of Windows text mode to read files. We could have fixed it by forcing text mode again, but it's better to fix the parsing code so that Windows-style text files on Unix systems don't cause problems. Back-patch to v10 where pgbench grew line continuations. Discussion: https://postgr.es/m/17194.1537191697@sss.pgh.pa.us
2018-09-17Allow concurrent-safe open() and fopen() in frontend code for WindowsMichael Paquier
PostgreSQL uses a custom wrapper for open() and fopen() which is concurrent-safe, allowing multiple processes to open and work on the same file. This has a couple of advantages: - pg_test_fsync does not handle O_DSYNC correctly otherwise, leading to false claims that disks are unsafe. - TAP tests can run into race conditions when a postmaster and pg_ctl open postmaster.pid, fixing some random failures in the buildfam. pg_upgrade is one frontend tool using workarounds to bypass file locking issues with the log files it generates, however the interactions with pg_ctl are proving to be tedious to get rid of, so this is left for later. Author: Laurenz Albe Reviewed-by: Michael Paquier, Kuntal Ghosh Discussion: https://postgr.es/m/1527846213.2475.31.camel@cybertec.at Discussion: https://postgr.es/m/16922.1520722108@sss.pgh.pa.us
2018-09-17Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: be9925199917aac824dd4b472bdce3b97dbc90ca
2018-09-16Fix out-of-tree build for transform modules.Andrew Gierth
Neither plperl nor plpython installed sufficient header files to permit transform modules to be built out-of-tree using PGXS. Fix that by installing all plperl and plpython header files (other than those with special purposes such as generated data tables), and also install plpython's special .mk file for mangling regression tests. (This commit does not fix the windows install, which does not currently install _any_ plperl or plpython headers.) Also fix the existing transform modules for hstore and ltree so that their cross-module #include directives work as anticipated by commit df163230b9 et seq. This allows them to serve as working examples of how to reference other modules when doing separate out-of-tree builds. Discussion: https://postgr.es/m/87o9ej8bgl.fsf%40news-spur.riddles.org.uk
2018-09-16Add outfuncs.c support for RawStmt nodes.Tom Lane
I noticed while poking at a report from Andrey Lepikhov that the recent addition of RawStmt nodes at the top of raw parse trees makes it impossible to print any raw parse trees whatsoever, because outfuncs.c doesn't know RawStmt and hence fails to descend into it. While we generally lack outfuncs.c support for utility statements, there is reasonably complete support for what you can find in a raw SELECT statement. It was not my intention to make that all dead code ... so let's add support for RawStmt. Back-patch to v10 where RawStmt appeared.
2018-09-15In v11, disable JIT by default (it's still enabled by default in HEAD).Tom Lane
Per discussion, JIT isn't quite mature enough to ship enabled-by-default. I failed to resist the temptation to do a bunch of copy-editing on the related documentation. Also, clean up some inconsistencies in which section of config.sgml the JIT GUCs are documented in vs. what guc.c and postgresql.config.sample had. Discussion: https://postgr.es/m/20180914222657.mw25esrzbcnu6qlu@alap3.anarazel.de
2018-09-15Fix failure with initplans used conditionally during EvalPlanQual rechecks.Tom Lane
The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit 9f2ee8f28, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
2018-09-14Move PartitionDispatchData struct definition to execPartition.cAlvaro Herrera
There's no reason to expose the struct definition, so don't. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/d3fa24c1-bc65-7133-81df-6474387ccc4f@lab.ntt.co.jp
2018-09-14Fix ALTER/TYPE on columns referenced by FKs in partitioned tablesAlvaro Herrera
When ALTER TABLE ... SET DATA TYPE affects a column referenced by constraints and indexes, it drop those constraints and indexes and recreates them afterwards, so that the definitions match the new data type. The original code did this by dropping one object at a time (commit 077db40fa1f3 of May 2004), which worked fine because the dependencies between the objects were pretty straightforward, and ordering the objects in a specific way was enough to make this work. However, when there are foreign key constraints in partitioned tables, the dependencies are no longer so straightforward, and we were getting errors when attempted: ERROR: cache lookup failed for constraint 16398 This can be fixed by doing all the drops in one pass instead, using performMultipleDeletions (introduced by df18c51f2955 of Aug 2006). With this change we can also remove the code to carefully order the list of objects to be deleted. Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com
2018-09-14Don't allow LIMIT/OFFSET clause within sub-selects to be pushed to workers.Amit Kapila
Allowing sub-select containing LIMIT/OFFSET in workers can lead to inconsistent results at the top-level as there is no guarantee that the row order will be fully deterministic. The fix is to prohibit pushing LIMIT/OFFSET within sub-selects to workers. Reported-by: Andrew Fletcher Bug: 15324 Author: Amit Kapila Reviewed-by: Dilip Kumar Backpatch-through: 9.6 Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
2018-09-13Message style improvementsPeter Eisentraut
Fix one untranslatable string concatenation in pg_rewind. Fix one message in pg_verify_checksums to use a style use elsewhere and avoid plural issues. Fix one gratuitous abbreviation in psql.
2018-09-13Attach FPI to the first record after full_page_writes is turned on.Amit Kapila
XLogInsert fails to attach a required FPI to the first record after full_page_writes is turned on by the last checkpoint. This bug got introduced in 9.5 due to code rearrangement in commits 2c03216d83 and 2076db2aea. Fix it by ensuring that XLogInsertRecord performs a recomputation when the given record is generated with FPW as off but found that the flag has been turned on while actually inserting the record. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Amit Kapila Backpatch-through: 9.5 where this problem was introduced Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp
2018-09-12Minor fixes for psql tab completion.Tom Lane
* Include partitioned tables in what's offered after ANALYZE. * Include toast_tuple_target in what's offered after ALTER TABLE ... SET|RESET. * Include HASH in what's offered after PARTITION BY. This is extracted from a larger patch; these bits seem like uncontroversial bug fixes for v11 features, so back-patch them into v11. Justin Pryzby Discussion: https://postgr.es/m/20180529000623.GA21896@telsasoft.com
2018-09-12Repair bug in regexp split performance improvements.Andrew Gierth
Commit c8ea87e4b introduced a temporary conversion buffer for substrings extracted during regexp splits. Unfortunately the code that sized it was failing to ignore the effects of ignored degenerate regexp matches, so for regexp_split_* calls it could under-size the buffer in such cases. Fix, and add some regression test cases (though those will only catch the bug if run in a multibyte encoding). Backpatch to 9.3 as the faulty code was. Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the report (via IRC) and assistance in analysis. Patch by me.
2018-09-12ecpg: Change --version output to common stylePeter Eisentraut
When we removed the ecpg-specific versions, we also removed the "(PostgreSQL)" from the --version output, which we show in other programs. Reported-by: Ioseph Kim <pgsql-kr@postgresql.kr>
2018-09-11Remove ruleutils.c's special case for BIT [VARYING] literals.Tom Lane
Up to now, get_const_expr() insisted on prefixing BIT and VARBIT literals with 'B'. That's not really necessary, because we always append explicit-cast syntax to identify the constant's type. Moreover, it's subtly wrong for VARBIT, because the parser will interpret B'...' as '...'::"bit"; see make_const() which explicitly assigns type BITOID for a T_BitString literal. So what had been a simple VARBIT literal is reconstructed as ('...'::"bit")::varbit, which is not the same thing, at least not before constant folding. This results in odd differences after dump/restore, as complained of by the patch submitter, and it could result in actual failures in partitioning or inheritance DDL operations (see commit 542320c2b, which repaired similar misbehaviors for some other data types). Fixing it is pretty easy: just remove the special case and let the default code path handle these types. We could have kept the special case for BIT only, but there seems little point in that. Like the previous patch, I judge that back-patching this into stable branches wouldn't be a good idea. However, it seems not quite too late for v11, so let's fix it there. Paul Guo, reviewed by Davy Machado and John Naylor, minor adjustments by me Discussion: https://postgr.es/m/CABQrizdTra=2JEqA6+Ms1D1k1Kqw+aiBBhC9TreuZRX2JzxLAA@mail.gmail.com
2018-09-11Repair double-free in SP-GIST rescan (bug #15378)Andrew Gierth
spgrescan would first reset traversalCxt, and then traverse a potentially non-empty stack containing pointers to traversalValues which had been allocated in those contexts, freeing them a second time. This bug originates in commit ccd6eb49a where traversalValue was introduced. Repair by traversing the stack before the context reset; this isn't ideal, since it means doing retail pfree in a context that's about to be reset, but the freeing of a stack entry is also done in other places in the code during the scan so it's not worth trying to refactor it further. Regression test added. Backpatch to 9.6 where the problem was introduced. Per bug #15378; analysis and patch by me, originally from a report on IRC by user velix; see also PostGIS ticket #4174; review by Alexander Korotkov. Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org
2018-09-10Use -Bsymbolic for shared libraries on HP-UX and Solaris.Tom Lane
These platforms are also subject to the mis-linking problem addressed in commit e3d77ea6b. It's not clear whether we could solve it with a solution equivalent to GNU ld's version scripts, but -Bsymbolic appears to fix it, so let's use that. Like the previous commit, back-patch as far as v10. Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org
2018-09-09Prevent mis-linking of src/port and src/common functions on *BSD.Tom Lane
On ELF-based platforms (and maybe others?) it's possible for a shared library, when dynamically loaded into the backend, to call the backend versions of src/port and src/common functions rather than the frontend versions that are actually linked into the shlib. This is the cause of bug #15367 from Jeremy Evans, and is likely to lead to more problems in future; it's accidental that we've failed to notice any bad effects up to now. The recommended way to fix this on ELF-based platforms is to use a linker "version script" that makes the shlib's versions of the functions local. (Apparently, -Bsymbolic would fix it as well, but with other side effects that we don't want.) Doing so has the additional benefit that we can make sure the shlib only exposes the symbols that are meant to be part of its API, and not ones that are just for cross-file references within the shlib. So we'd already been using a version script for libpq on popular platforms, but it's now apparent that it's necessary for correctness on every ELF-based platform. Hence, add appropriate logic to the openbsd, freebsd, and netbsd stanzas of Makefile.shlib; this is just a copy-and-paste from the linux stanza. There may be additional work to do if commit ed0cdf0e0 reveals that the problem exists elsewhere, but this is all that is known to be needed right now. Back-patch to v10 where SCRAM support came in. The problem is ancient, but analysis suggests that there were no really severe consequences in older branches. Hence, I won't take the risk of such a large change in the build process for older branches. In passing, remove a rather opaque comment about -Bsymbolic; I don't think it's very on-point about why we don't use that, if indeed that's what it's talking about at all. Patch by me; thanks to Andrew Gierth for helping to diagnose the problem, and for additional testing. Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org
2018-09-09Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov
ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
2018-09-08Allow ENOENT in check_mode_recursive().Noah Misch
Buildfarm member tern failed src/bin/pg_ctl/t/001_start_stop.pl when a check_mode_recursive() call overlapped a server's startup-time deletion of pg_stat/global.stat. Just warn. Also, include errno in the message. Back-patch to v11, where check_mode_recursive() first appeared.
2018-09-08Fix logical subscriber wait in test.Noah Misch
Buildfarm members sungazer and tern revealed this deficit. Back-patch to v10, like commit 4f10e7ea7b2231f453bb18b6e710ac333eaf121b, which introduced the test.
2018-09-08Minor cleanup/future-proofing for pg_saslprep().Tom Lane
Ensure that pg_saslprep() initializes its output argument to NULL in all failure paths, and then remove the redundant initialization that some (not all) of its callers did. This does not fix any live bug, but it reduces the odds of future bugs of omission. Also add a comment about why the existing failure-path coding is adequate. Back-patch so as to keep the function's API consistent across branches, again to forestall future bug introduction. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/16558.1536407783@sss.pgh.pa.us
2018-09-07Save/restore SPI's global variables in SPI_connect() and SPI_finish().Tom Lane
This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
2018-09-07Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.Tom Lane
It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07Fix longstanding recursion hazard in sinval message processing.Tom Lane
LockRelationOid and sibling routines supposed that, if our session already holds the lock they were asked to acquire, they could skip calling AcceptInvalidationMessages on the grounds that we must have already read any remote sinval messages issued against the relation being locked. This is normally true, but there's a critical special case where it's not: processing inside AcceptInvalidationMessages might attempt to access system relations, resulting in a recursive call to acquire a relation lock. Hence, if the outer call had acquired that same system catalog lock, we'd fall through, despite the possibility that there's an as-yet-unread sinval message for that system catalog. This could, for example, result in failure to access a system catalog or index that had just been processed by VACUUM FULL. This is the explanation for buildfarm failures we've been seeing intermittently for the past three months. The bug is far older than that, but commits a54e1f158 et al added a new recursion case within AcceptInvalidationMessages that is apparently easier to hit than any previous case. To fix this, we must not skip calling AcceptInvalidationMessages until we have *finished* a call to it since acquiring a relation lock, not merely acquired the lock. (There's already adequate logic inside AcceptInvalidationMessages to deal with being called recursively.) Fortunately, we can implement that at trivial cost, by adding a flag to LOCALLOCK hashtable entries that tracks whether we know we have completed such a call. There is an API hazard added by this patch for external callers of LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD, it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR into thinking the lock wasn't already held. This should be a fail-soft condition, though, unless something very bizarre is being done in response to the test. Also, I added an additional output argument to LockAcquireExtended, assuming that that probably isn't called by any outside code given the very limited usefulness of its additional functionality. Back-patch to all supported branches. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07Refactor installation of extension headers.Andrew Gierth
Commit be54b3777 failed on gmake 3.80 due to a chained conditional, which on closer examination could be removed entirely with some refactoring elsewhere for a net simplification and more robustness against empty expansions. Along the way, add some more comments. Also make explicit in the documentation and comments that built headers are not removed by 'make clean', since we don't typically want that for headers generated by a separate ./configure step, and it's much easier to add your own 'distclean' rule or use EXTRA_CLEAN than to try and override a deletion rule in pgxs.mk. Per buildfarm member prariedog and comments by Michael Paquier, though all the actual changes are my fault.
2018-09-06Fix the overrun in hash index metapage for smaller block sizes.Amit Kapila
The commit 620b49a1 changed the value of HASH_MAX_BITMAPS with the intent to allow many non-unique values in hash indexes without worrying to reach the limit of the number of overflow pages. At that time, this didn't occur to us that it can overrun the block for smaller block sizes. Choose the value of HASH_MAX_BITMAPS based on BLCKSZ such that it gives the same answer as now for the cases where the overrun doesn't occur, and some other sufficiently-value for the cases where an overrun currently does occur. This allows us not to change the behavior in any case that currently works, so there's really no reason for a HASH_VERSION bump. Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAA4eK1LtF4VmU4mx_+i72ff1MdNZ8XaJMGkt2HV8+uSWcn8t4A@mail.gmail.com
2018-09-05Allow extensions to install built as well as unbuilt headers.Andrew Gierth
Commit df163230b overlooked the case that an out-of-tree extension might need to build its header files (e.g. via ./configure). If it is also doing a VPATH build, the HEADERS_* rules in the original commit would then fail to find the files, since they would be looking only under $(srcdir) and not in the build directory. Fix by adding HEADERS_built and HEADERS_built_$(MODULE) which behave like DATA_built in that they look in the build dir rather than the source dir (and also make the files dependencies of the "all" target). No Windows support appears to be needed for this, since it is only relevant to out-of-tree builds (no support exists in Mkvcbuild.pm to build extension header files in any case).
2018-09-05Remove no-longer-used variable.Tom Lane
Oversight in 2fbdf1b38. Per buildfarm.
2018-09-05Make argument names of pg_get_object_address consistent, and fix docs.Tom Lane
pg_get_object_address and pg_identify_object_as_address are supposed to be inverses, but they disagreed as to the names of the arguments representing the textual form of an object address. Moreover, the documented argument names didn't agree with reality at all, either for these functions or pg_identify_object. In HEAD and v11, I think we can get away with renaming the input arguments of pg_get_object_address to match the outputs of pg_identify_object_as_address. In theory that might break queries using named-argument notation to call pg_get_object_address, but it seems really unlikely that anybody is doing that, or that they'd have much trouble adjusting if they were. In older branches, we'll just live with the lack of consistency. Aside from fixing the documentation of these functions to match reality, I couldn't resist the temptation to do some copy-editing. Per complaint from Jean-Pierre Pelletier. Back-patch to 9.5 where these functions were introduced. (Before v11, this is a documentation change only.) Discussion: https://postgr.es/m/CANGqjDnWH8wsTY_GzDUxbt4i=y-85SJreZin4Hm8uOqv1vzRQA@mail.gmail.com
2018-09-05Simplify partitioned table creation vs. relcacheAlvaro Herrera
In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and *then* store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
2018-09-04Fully enforce uniqueness of constraint names.Tom Lane
It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04Prohibit pushing subqueries containing window function calculation toAmit Kapila
workers. Allowing window function calculation in workers leads to inconsistent results because if the input row ordering is not fully deterministic, the output of window functions might vary across workers. The fix is to treat them as parallel-restricted. In the passing, improve the coding pattern in max_parallel_hazard_walker so that it has a chain of mutually-exclusive if ... else if ... else if ... else if ... IsA tests. Reported-by: Marko Tiikkaja Bug: 15324 Author: Amit Kapila Reviewed-by: Tom Lane Backpatch-through: 9.6 Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com