summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2020-01-12Remove incorrect assertion for INSERT in logical replication's publisherMichael Paquier
On the publisher, it was assumed that an INSERT change cannot happen for a relation with no replica identity. However this is true only for a change that needs references to old rows, aka UPDATE or DELETE, so trying to use logical replication with a relation that has no replica identity led to an assertion failure in the publisher when issuing an INSERT. This commit removes the incorrect assertion, and adds more regression tests to provide coverage for relations without replica identity. Reported-by: Neha Sharma Author: Dilip Kumar, Michael Paquier Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com Backpatch-through: 10
2020-01-11Extensive code review for GSSAPI encryption mechanism.Tom Lane
Fix assorted bugs in handling of non-blocking I/O when using GSSAPI encryption. The encryption layer could return the wrong status information to its caller, resulting in effectively dropping some data (or possibly in aborting a not-broken connection), or in a "livelock" situation where data remains to be sent but the upper layers think transmission is done and just go to sleep. There were multiple small thinkos contributing to that, as well as one big one (failure to think through what to do when a send fails after having already transmitted data). Note that these errors could cause failures whether the client application asked for non-blocking I/O or not, since both libpq and the backend always run things in non-block mode at this level. Also get rid of use of static variables for GSSAPI inside libpq; that's entirely not okay given that multiple connections could be open at once inside a single client process. Also adjust a bunch of random small discrepancies between the frontend and backend versions of the send/receive functions -- except for error handling, they should be identical, and now they are. Also extend the Kerberos TAP tests to exercise cases where nontrivial amounts of data need to be pushed through encryption. Before, those tests didn't provide any useful coverage at all for the cases of interest here. (They still might not, depending on timing, but at least there's a chance.) Per complaint from pmc@citylink and subsequent investigation. Back-patch to v12 where this code was introduced. Discussion: https://postgr.es/m/20200109181822.GA74698@gate.oper.dinoex.org
2020-01-10Maintain valid md.c state when FileClose() fails.Noah Misch
FileClose() failure ordinarily causes a PANIC. Suppose the user disables that PANIC via data_sync_retry=on. After mdclose() issued a FileClose() that failed, calls into md.c raised SIGSEGV. This fix adds repalloc() calls during mdclose(); update a comment about ignoring repalloc() cost. The rate of relation segment count change is a minor factor; more relevant to overall performance is the rate of mdclose() and subsequent re-opening of segments. Back-patch to v10, where commit 45e191e3aa62d47a8bc1a33f784286b2051f45cb introduced the bug. Reviewed by Kyotaro Horiguchi. Discussion: https://postgr.es/m/20191222091930.GA1280238@rfd.leadboat.com
2020-01-10doc: Fix naming of SELinuxMichael Paquier
Reported-by: Tham Nguyen Discussion: https://postgr.es/m/157851402876.29175.12977878383183540468@wrigleys.postgresql.org Backpatch-through: 9.4
2020-01-08Reimplement nullification of walsender timestampAlvaro Herrera
Make the value null only at pg_stat_activity-output time, as suggested by Tom Lane, instead of messing with the internal state. This should appease buildfarm members with force_parallel_mode=regress, which are running parallel queries on logical replication walsenders. The fact that walsenders can run parallel queries should perhaps be studied more carefully, but for the moment let's get rid of the red blots in buildfarm. Backpatch to pg10, like the previous commit. Discussion: https://postgr.es/m/30804.1578438763@sss.pgh.pa.us
2020-01-08Fix handling of generated columns in ALTER TABLE.Tom Lane
ALTER TABLE failed if a column referenced in a GENERATED expression had been added or changed in type earlier in the ALTER command. That's because the GENERATED expression needs to be evaluated against the table's updated tuples, but it was being evaluated against the original tuples. (Fortunately the executor has adequate cross-checks to notice the mismatch, so we just got an obscure error message and not anything more dangerous.) Per report from Andreas Joseph Krogh. Back-patch to v12 where GENERATED was added. Discussion: https://postgr.es/m/VisenaEmail.200.231b0a41523275d0.16ea7f800c7@tc7-visena
2020-01-08Revert "Forbid DROP SCHEMA on temporary namespaces"Michael Paquier
This reverts commit a052f6c, following complains from Robert Haas and Tom Lane. Backpatch down to 9.4, like the previous commit. Discussion: https://postgr.es/m/CA+TgmobL4npEX5=E5h=5Jm_9mZun3MT39Kq2suJFVeamc9skSQ@mail.gmail.com Backpatch-through: 9.4
2020-01-07pg_stat_activity: show NULL stmt start time for walsendersAlvaro Herrera
Returning a non-NULL time is pointless, sinc a walsender is not a process that would be running normal transactions anyway, but the code was unintentionally exposing the process start time intermittently, which was not only bogus but it also confused monitoring systems looking for idle transactions. Fix by avoiding all updates in walsenders. Backpatch to 11, where walsenders started appearing in pg_stat_activity. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/20191209234409.exe7osmyalwkt5j4@development
2020-01-06Reduce the number of GetFlushRecPtr() calls done by walsenders.Tom Lane
Since the WAL flush position only moves forward, it's safe to cache its previous value within each walsender process, and update from shared memory only once we've caught up to the previously-seen value. When there are many active walsenders, this makes for a very significant reduction in the amount of contention on the XLogCtl->info_lck spinlock. This patch also adjusts the logic so that we update our idea of the flush position after processing a WAL record, rather than beforehand. This may cause us to realize we're not caught up when the preceding coding would've thought that we were, but that seems all to the good; it may avoid a useless sleep-and-wakeup cycle. Back-patch to v12. The contention problem exists in prior branches, but it's much less severe (due to inefficiencies elsewhere) so there seems no need to take any risk of back-patching further. Pierre Ducroquet, reviewed by Julien Rouhaud Discussion: https://postgr.es/m/2931018.Vxl9zapr77@pierred-pdoc
2020-01-06Have logical replication subscriber fire column triggersPeter Eisentraut
The logical replication apply worker did not fire per-column update triggers because the updatedCols bitmap in the RTE was not populated. This fixes that. Reviewed-by: Euler Taveira <euler@timbira.com.br> Discussion: https://www.postgresql.org/message-id/flat/21673e2d-597c-6afe-637e-e8b10425b240%402ndquadrant.com
2020-01-02Fix cloning of row triggers to sub-partitionsAlvaro Herrera
When row triggers exist in partitioned partitions that are not either part of FKs or deferred unique constraints, they are not correctly cloned to their partitions. That's because they are marked "internal", and those are purposefully skipped when doing the clone triggers dance. Fix by relaxing the condition on which internal triggers are skipped. Amit Langote initially diagnosed the problem and proposed a fix, but I used a different approach. Reported-by: Petr Fedorov Discussion: https://postgr.es/m/6b3f0646-ba8c-b3a9-c62d-1c6651a1920f@phystech.edu
2020-01-02Fix comment in testPeter Eisentraut
The comment was apparently copy-and-pasted and did not reflect the actual test outcome.
2020-01-02Fix running out of file descriptors for spill files.Amit Kapila
Currently while decoding changes, if the number of changes exceeds a certain threshold, we spill those to disk.  And this happens for each (sub)transaction.  Now, while reading all these files, we don't close them until we read all the files.  While reading these files, if the number of such files exceeds the maximum number of file descriptors, the operation errors out. Use PathNameOpenFile interface to open these files as that internally has the mechanism to release kernel FDs as needed to get us under the max_safe_fds limit. Reported-by: Amit Khandekar Author: Amit Khandekar Reviewed-by: Amit Kapila Backpatch-through: 9.4 Discussion: https://postgr.es/m/CAJ3gD9c-sECEn79zXw4yBnBdOttacoE-6gAyP0oy60nfs_sabQ@mail.gmail.com
2019-12-27Add pg_dump test for triggers on partitioned tablesAlvaro Herrera
This currently works, but add this test to ensure it continues to work. Lack of this test became evident after a recent bugfix submission that would have inadvertently broken it, in https://postgr.es/m/CA+HiwqFM2=i+uHB9o4OkLbE2S3sjPHoVe2wXuAD1GLJ4+Pk9eg@mail.gmail.com
2019-12-27Forbid DROP SCHEMA on temporary namespacesMichael Paquier
This operation was possible for the owner of the schema or a superuser. Down to 9.4, doing this operation would cause inconsistencies in a session whose temporary schema was dropped, particularly if trying to create new temporary objects after the drop. A more annoying consequence is a crash of autovacuum on an assertion failure when logging information about an orphaned temp table dropped. Note that because of 246a6c8 (present in v11~), which has made the removal of orphaned temporary tables more aggressive, the failure could be triggered more easily, but it is possible to reproduce down to 9.4. Reported-by: Mahendra Singh, Prabhat Sahu Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Mahendra Singh Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com Backpatch-through: 9.4
2019-12-26Fix possible loss of sync between rectypeid and underlying PLpgSQL_type.Tom Lane
When revalidate_rectypeid() acts to update a stale record type OID in plpgsql's data structures, it fixes the active PLpgSQL_rec struct as well as the PLpgSQL_type struct it references. However, the latter is shared across function executions while the former is not. In a later function execution, the PLpgSQL_rec struct would be reinitialized by copy_plpgsql_datums and would then contain a stale type OID, typically leading to "could not open relation with OID NNNN" errors. revalidate_rectypeid() can easily fix this, fortunately, just by treating typ->typoid as authoritative. Per report and diagnosis from Ashutosh Sharma, though this is not his suggested fix. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/CAE9k0Pkd4dZwt9J5pS9xhJFWpUtqs05C9xk_GEwPzYdV=GxwWg@mail.gmail.com
2019-12-26Fix some comments related to logical repslot advancingMichael Paquier
confirmed_flush is part of a replication slot's information, but not confirmed_lsn. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20191226.175919.17237335658671970.horikyota.ntt@gmail.com Backpatch-through: 11
2019-12-24Rotate instead of shifting hash join batch number.Thomas Munro
Our algorithm for choosing batch numbers turned out not to work effectively for multi-billion key inner relations. We would use more hash bits than we have, and effectively concentrate all tuples into a smaller number of batches than we intended. While ideally we should switch to wider hashes, for now, change the algorithm to one that effectively gives up bits from the bucket number when we don't have enough bits. That means we'll finish up with longer bucket chains than would be ideal, but that's better than having batches that don't fit in work_mem and can't be divided. Batch-patch to all supported releases. Author: Thomas Munro Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion Reported-by: James Coleman Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
2019-12-23Disallow partition key expressions that return pseudo-types.Tom Lane
This wasn't checked originally, but it should have been, because in general pseudo-types can't be stored to and retrieved from disk. Notably, partition bound values of type "record" would not be interpretable by another session. In v12 and HEAD, add another flag to CheckAttributeType's repertoire so that it can produce a specific error message for this case. That's infeasible in older branches without an ABI break, so fall back to a slightly-less-nicely-worded error message in v10 and v11. Problem noted by Amit Langote, though this patch is not his initial solution. Back-patch to v10 where partitioning was introduced. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
2019-12-23Prevent a rowtype from being included in itself via a range.Tom Lane
We probably should have thought of this case when ranges were added, but we didn't. (It's not the fault of commit eb51af71f, because ranges didn't exist then.) It's an old bug, so back-patch to all supported branches. Discussion: https://postgr.es/m/7782.1577051475@sss.pgh.pa.us
2019-12-22Avoid low-probability regression test failures in timestamp[tz] tests.Tom Lane
If the first transaction block in these tests were entered exactly at midnight (California time), they'd report a bogus failure due to 'now' and 'midnight' having the same values. Commit 8c2ac75c5 had dismissed this as being of negligible probability, but we've now seen it happen in the buildfarm, so let's prevent it. We can get pretty much the same test coverage without an it's-not-midnight assumption by moving the does-'now'-work cases into their own test step. While here, apply commit 47169c255's s/DELETE/TRUNCATE/ change to timestamptz as well as timestamp (not sure why that didn't occur to me at the time; the risk of failure is the same). Back-patch to all supported branches, since the main point is to get rid of potential buildfarm failures. Discussion: https://postgr.es/m/14821.1577031117@sss.pgh.pa.us
2019-12-21In pgwin32_open, loop after ERROR_ACCESS_DENIED only if we can't stat.Tom Lane
This fixes a performance problem introduced by commit 6d7547c21. ERROR_ACCESS_DENIED is returned in some other cases besides the delete-pending case considered by that commit; notably, if the given path names a directory instead of a plain file. In that case we'll uselessly loop for 1 second before returning the failure condition. That slows down some usage scenarios enough to cause test timeout failures on our Windows buildfarm critters. To fix, try to stat() the file, and sleep/loop only if that fails. It will fail in the delete-pending case, and also in the case where the deletion completed before we could stat(), so we have the cases where we want to loop covered. In the directory case, the stat() should succeed, letting us exit without a wait. One case where we'll still wait uselessly is if the access-denied problem pertains to a directory in the given pathname. But we don't expect that to happen in any performance-critical code path. There might be room to refine this further, but I'll push it now in hopes of making the buildfarm green again. Back-patch, like the preceding commit. Alexander Lakhin and Tom Lane Discussion: https://postgr.es/m/23073.1576626626@sss.pgh.pa.us
2019-12-20libpq should expose GSS-related parameters even when not implemented.Tom Lane
We realized years ago that it's better for libpq to accept all connection parameters syntactically, even if some are ignored or restricted due to lack of the feature in a particular build. However, that lesson from the SSL support was for some reason never applied to the GSSAPI support. This is causing various buildfarm members to have problems with a test case added by commit 6136e94dc, and it's just a bad idea from a user-experience standpoint anyway, so fix it. While at it, fix some places where parameter-related infrastructure was added with the aid of a dartboard, or perhaps with the aid of the anti-pattern "add new stuff at the end". It should be safe to rearrange the contents of struct pg_conn even in released branches, since that's private to libpq (and we'd have to move some fields in some builds to fix this, anyway). Back-patch to all supported branches. Discussion: https://postgr.es/m/11297.1576868677@sss.pgh.pa.us
2019-12-19Update neglected comment.Robert Haas
Commit d986d4e87f61c68f52c68ebc274960dc664b7b4e renamed a variable but neglected to update the corresponding comment. Amit Langote
2019-12-18Fix subscriber invalid memory access on DDL.Amit Kapila
This patch allows building the local relmap cache for a subscribed relation after processing pending invalidation messages and potential relcache updates. Without this, the attributes in the local cache don't tally with the updated relcache entry leading to invalid memory access. Reported-by Jehan-Guillaume de Rorthais Author: Jehan-Guillaume de Rorthais and Vignesh C Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/20191025175929.7e90dbf5@firost
2019-12-18Remove shadow variables linked to RedoRecPtr in xlog.cMichael Paquier
This changes the routines in charge of recycling WAL segments past the last redo LSN to not use anymore "RedoRecPtr" as a local variable, which is also available in the context of the session as a static declaration, replacing it with "lastredoptr". This confusion has been introduced by d9fadbf, so backpatch down to v11 like the other commit. Thanks to Tom Lane, Robert Haas, Alvaro Herrera, Mark Dilger and Kyotaro Horiguchi for the input provided. Author: Ranier Vilela Discussion: https://postgr.es/m/MN2PR18MB2927F7B5F690065E1194B258E35D0@MN2PR18MB2927.namprd18.prod.outlook.com Backpatch-through: 11
2019-12-17Fix error reporting for index expressions of prohibited types.Tom Lane
If CheckAttributeType() threw an error about the datatype of an index expression column, it would report an empty column name, which is pretty unhelpful and certainly not the intended behavior. I (tgl) evidently broke this in commit cfc5008a5, by not noticing that the column's attname was used above where I'd placed the assignment of it. In HEAD and v12, this is trivially fixable by moving up the assignment of attname. Before v12 the code is a bit more messy; to avoid doing substantial refactoring, I took the lazy way out and just put in two copies of the assignment code. Report and patch by Amit Langote. Back-patch to all supported branches. Discussion: https://postgr.es/m/CA+HiwqFA+BGyBFimjiYXXMa2Hc3fcL0+OJOyzUNjhU4NCa_XXw@mail.gmail.com
2019-12-17Change overly strict Assert in TransactionGroupUpdateXidStatus.Amit Kapila
This Assert thought that an overflowed transaction can never get registered for the group update.  But that is not true, because even when the number of children for a transaction got reduced, the overflow flag is not changed. And, for group update, we only care about the current number of children for a transaction that is being committed. Based on comments by Andres Freund, remove a redundant Assert in TransactionIdSetPageStatus as we already had a static Assert for the same condition a few lines earlier. Reported-by: Vignesh C Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/CAFiTN-s5=uJw-Z6JC9gcqtBSjXsrHnU63PXBrA=pnBjqnkm5UA@mail.gmail.com
2019-12-16On Windows, wait a little to see if ERROR_ACCESS_DENIED goes away.Tom Lane
Attempting to open a file fails with ERROR_ACCESS_DENIED if the file is flagged for deletion but not yet actually gone (another in a long list of reasons why Windows is broken, if you ask me). This seems likely to explain a lot of irreproducible failures we see in the buildfarm. This state generally persists for only a millisecond or so, so just wait a bit and retry. If it's a real permissions problem, we'll eventually give up and report it as such. If it's the pending deletion case, we'll see file-not-found and report that after the deletion completes, and the caller will treat that in an appropriate way. In passing, rejigger the existing retry logic for some other error cases so that we don't uselessly wait an extra time when we're not going to retry anymore. Alexander Lakhin (with cosmetic tweaks by me). Back-patch to all supported branches, since this seems like a pretty safe change and the problem is definitely real. Discussion: https://postgr.es/m/16161-7a985d2f1bbe8f71@postgresql.org
2019-12-16Fix yet another crash in page split during GiST index creation.Heikki Linnakangas
Commit a7ee7c8513 fixed a bug in GiST page split during index creation, where we failed to re-find the position of a downlink after the page containing it was split. However, that fix was incomplete; the other call to gistinserttuples() in the same function needs to also clear 'downlinkoffnum'. Fixes bug #16134 reported by Alexander Lakhin, for real this time. The previous fix was enough to fix the crash with the reproducer script for bug #16162, but the original script for #16134 was still crashing. Backpatch to v12, like the previous incomplete fix. Discussion: https://www.postgresql.org/message-id/d869f537-abe4-d2ea-0510-38cd053f5152%40gmail.com
2019-12-16Clean up some misplaced comments in partition_join.sql regression test.Etsuro Fujita
Also, add a comment explaining a test case. Back-patch to 11 where the regression test was added. Discussion: https://postgr.es/m/CAPmGK15adZPh2B%2BmGUjSOMH%2BH39ogDRWfCfm4G6jncZCAs9V_Q%40mail.gmail.com
2019-12-14Prevent overly-aggressive collapsing of joins to RTE_RESULT relations.Tom Lane
The RTE_RESULT simplification logic added by commit 4be058fe9 had a flaw: it would collapse out a RTE_RESULT that is due to compute a PlaceHolderVar, and reassign the PHV to the parent join level, even if another input relation of the join contained a lateral reference to the PHV. That can't work because the PHV would be computed too late. In practice it led to failures of internal sanity checks later in planning (either assertion failures or errors such as "failed to construct the join relation"). To fix, add code to check for the presence of such PHVs in relevant portions of the query tree. Notably, this required refactoring range_table_walker so that a caller could ask to walk individual RTEs not the whole list. (It might be a good idea to refactor range_table_mutator in the same way, if only to keep those functions looking similar; but I didn't do so here as it wasn't necessary for the bug fix.) This exercise also taught me that find_dependent_phvs(), as it stood, could only safely be used on the entire Query, not on subtrees. Adjust its API to reflect that; which in passing allows it to have a fast path for the common case of no PHVs anywhere. Per report from Will Leinweber. Back-patch to v12 where the bug was introduced. Discussion: https://postgr.es/m/CALLb-4xJMd4GZt2YCecMC95H-PafuWNKcmps4HLRx2NHNBfB4g@mail.gmail.com
2019-12-14Fix mdsyncfiletag(), take II.Thomas Munro
The previous commit failed to consider that FileGetRawDesc() might not return a valid fd, as discovered on the build farm. Switch to using the File interface only. Back-patch to 12, like the previous commit.
2019-12-14Don't use _mdfd_getseg() in mdsyncfiletag().Thomas Munro
_mdfd_getseg() opens all segments up to the requested one. That causes problems for mdsyncfiletag(), if mdunlinkfork() has already unlinked other segment files. Open the file we want directly by name instead, if it's not already open. The consequence of this bug was a rare panic in the checkpointer, made more likely if you saturated the sync request queue so that the SYNC_FORGET_REQUEST messages for a given relation were more likely to be absorbed in separate cycles by the checkpointer. Back-patch to 12. Defect in commit 3eb77eba. Author: Thomas Munro Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20191119115759.GI30362%40telsasoft.com
2019-12-13Fix crash when a page was split during GiST index creation.Heikki Linnakangas
The bug was similar to the one that was fixed in commit 22251686f0. When we split page X and insert the downlink for the new page, the parent page might also need to be split. When that happens, the downlink offset number we remembered for X is no longer valid. We correctly called gistFindCorrectParent() to re-find it, but gistFindCorrectParent() doesn't do anything if the LSN of the page hasn't changed, and we stopped updating LSNs during index build in commit 9155580fd5. The buggy codepath was taken if the page was split into three or more pages, and inserting the downlink caused the parent page to split. To fix, explicitly mark the downlink offset number as invalid, to force gistFindCorrectParent() to re-find it. Fixes bug #16134 reported by Alexander Lakhin, reported again as #16162 by Andreas Kunert. Thanks to Jeff Janes, Tom Lane and Tomas Vondra for debugging. Backpatch to v12, where we stopped WAL-logging during index build. Discussion: https://www.postgresql.org/message-id/16134-0423f729671dec64%40postgresql.org Discussion: https://www.postgresql.org/message-id/16162-45d21b7b6c1a3105%40postgresql.org
2019-12-12Fix EXTRACT(ISOYEAR FROM timestamp) for years BC.Tom Lane
The test cases added by commit 26ae3aa80 exposed an old oversight in timestamp[tz]_part: they didn't correct the result of date2isoyear() for BC years, so that we produced an off-by-one answer for such years. Fix that, and back-patch to all supported branches. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
2019-12-12Remove redundant function calls in timestamp[tz]_part().Tom Lane
The DTK_DOW/DTK_ISODOW and DTK_DOY switch cases in timestamp_part() and timestamptz_part() contained calls of timestamp2tm() that were fully redundant with the ones done just above the switch. This evidently crept in during commit 258ee1b63, which relocated that code from another place where the calls were indeed needed. Just delete the redundant calls. I (tgl) noted that our test coverage of these functions left quite a bit to be desired, so extend timestamp.sql and timestamptz.sql to cover all the branches. Back-patch to all supported branches, as the previous commit was. There's no real issue here other than some wasted cycles in some not-too-heavily-used code paths, but the test coverage seems valuable. Report and patch by Li Japin; test case adjustments by me. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
2019-12-12Remove extra parenthesis from comment.Etsuro Fujita
2019-12-10In pg_ctl, work around ERROR_SHARING_VIOLATION on the postmaster log file.Tom Lane
On Windows, we use CMD.EXE to redirect the postmaster's stdout/stderr into a log file. CMD.EXE will open that file with non-sharing-friendly parameters, and the file will remain open for a short time after the postmaster has removed postmaster.pid. This can result in an ERROR_SHARING_VIOLATION failure if we attempt to start a new postmaster immediately with the same log file (e.g. during "pg_ctl restart"). This seems to explain intermittent buildfarm failures we've been seeing on Windows machines. To fix, just open and close the log file using our own pgwin32_open(), which will wait if necessary to avoid the failure. (Perhaps someday we should stop using CMD.EXE, but that would be a far more complex patch, and it doesn't seem worth the trouble ... yet.) Back-patch to v12. This only solves the problem when frontend fopen() is redirected to pgwin32_fopen(), which has only been true since commit 0ba06e0bf. Hence, no point in back-patching further, unless we care to back-patch that change too. Diagnosis and patch by Alexander Lakhin (bug #16154). Discussion: https://postgr.es/m/16154-1ccf0b537b24d5e0@postgresql.org
2019-12-10Fix handling of multiple AFTER ROW triggers on a foreign table.Etsuro Fujita
AfterTriggerExecute() retrieves a fresh tuple or pair of tuples from a tuplestore and then stores the tuple(s) in the passed-in slot(s) if AFTER_TRIGGER_FDW_FETCH, while it uses the most-recently-retrieved tuple(s) stored in the slot(s) if AFTER_TRIGGER_FDW_REUSE. This was done correctly before 12, but commit ff11e7f4b broke it by mistakenly clearing the tuple(s) stored in the slot(s) in that function, leading to an assertion failure as reported in bug #16139 from Alexander Lakhin. Also, fix some other issues with the aforementioned commit in passing: * For tg_newslot, which is a slot added to the TriggerData struct by the commit to store new updated tuples, it didn't ensure the slot was NULL if there was no such tuple. * The commit failed to update the documentation about the trigger interface. Author: Etsuro Fujita Backpatch-through: 12 Discussion: https://postgr.es/m/16139-94f9ccf0db6119ec%40postgresql.org
2019-12-09Fix race condition in our Windows signal emulation.Tom Lane
pg_signal_dispatch_thread() responded to the client (signal sender) and disconnected the pipe before actually setting the shared variables that make the signal visible to the backend process's main thread. In the worst case, it seems, effective delivery of the signal could be postponed for as long as the machine has any other work to do. To fix, just move the pg_queue_signal() call so that we do it before responding to the client. This essentially makes pgkill() synchronous, which is a stronger guarantee than we have on Unix. That may be overkill, but on the other hand we have not seen comparable timing bugs on any Unix platform. While at it, add some comments to this sadly underdocumented code. Problem diagnosis and fix by Amit Kapila; I just added the comments. Back-patch to all supported versions, as it appears that this can cause visible NOTIFY timing oddities on all of them, and there might be other misbehavior due to slow delivery of other signals. Discussion: https://postgr.es/m/32745.1575303812@sss.pgh.pa.us
2019-12-09Improve isolationtester's timeout management.Tom Lane
isolationtester.c had a hard-wired limit of 3 minutes per test step. It now emerges that this isn't quite enough for some of the slowest buildfarm animals. This isn't the first time we've had to raise this limit (cf. 1db439ad4), so let's make it configurable. This patch raises the default to 5 minutes, and introduces an environment variable PGISOLATIONTIMEOUT that can be set if more time is needed, following the precedent of PGCTLTIMEOUT. Also, modify isolationtester so that when the timeout is hit, it explicitly reports having sent a cancel. This makes the regression failure log considerably more intelligible. (In the worst case, a timed-out test might actually be reported as "passing" without this extra output, so arguably this is a bug fix in itself.) In passing, update the README file, which had apparently not gotten touched when we added "make check" support here. Back-patch to 9.6; older versions don't have comparable timeout logic. Discussion: https://postgr.es/m/22964.1575842935@sss.pgh.pa.us
2019-12-09Fix typos in miscinit.c.Amit Kapila
Commit f13ea95f9e moved the description of postmaster.pid file contents from miscadmin.h to pidfile.h, but missed to update the comments in miscinit.c. Author: Hadi Moshayedi Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAK=1=WpYEM9x3LGkaxgXaxeYQjnkdW8XLsxrYRTE2Gq-H83FMw@mail.gmail.com
2019-12-04Fix whitespace.Etsuro Fujita
2019-12-03Fix failures with TAP tests of pg_ctl on WindowsMichael Paquier
On Windows, all the hosts spawned by the TAP tests bind to 127.0.0.1. Hence, if there is a port conflict, starting a cluster would immediately fail. One of the test scripts of pg_ctl initializes a node without PostgresNode.pm, using the default port 5432. This could cause unexpected startup failures in the tests if an independent server was up and running on the same host (the reverse is also possible, though more unlikely). Fix this issue by assigning properly a free port to the node configured, in the same range used as for the other nodes part of the tests. Author: Michael Paquier Reviewed-by: Andrew Dunstan Discussion: https://postgr.es/m/20191202031444.GC1696@paquier.xyz Backpatch-through: 11
2019-12-01Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables.Tom Lane
We implement ON COMMIT DELETE ROWS by truncating tables marked that way, which requires also truncating/rebuilding their indexes. But RelationTruncateIndexes asks the relcache for up-to-date copies of any index expressions, which may cause execution of eval_const_expressions on them, which can result in actual execution of subexpressions. This is a bad thing to have happening during ON COMMIT. Manuel Rigger reported that use of a SQL function resulted in crashes due to expectations that ActiveSnapshot would be set, which it isn't. The most obvious fix perhaps would be to push a snapshot during PreCommit_on_commit_actions, but I think that would just open the door to more problems: CommitTransaction explicitly expects that no user-defined code can be running at this point. Fortunately, since we know that no tuples exist to be indexed, there seems no need to use the real index expressions or predicates during RelationTruncateIndexes. We can set up dummy index expressions instead (we do need something that will expose the right data type, as there are places that build index tupdescs based on this), and just ignore predicates and exclusion constraints. In a green field it'd likely be better to reimplement ON COMMIT DELETE ROWS using the same "init fork" infrastructure used for unlogged relations. That seems impractical without catalog changes though, and even without that it'd be too big a change to back-patch. So for now do it like this. Per private report from Manuel Rigger. This has been broken forever, so back-patch to all supported branches.
2019-11-30Fix off-by-one error in PGTYPEStimestamp_fmt_ascTomas Vondra
When using %b or %B patterns to format a date, the code was simply using tm_mon as an index into array of month names. But that is wrong, because tm_mon is 1-based, while array indexes are 0-based. The result is we either use name of the next month, or a segfault (for December). Fix by subtracting 1 from tm_mon for both patterns, and add a regression test triggering the issue. Backpatch to all supported versions (the bug is there far longer, since at least 2003). Reported-by: Paul Spencer Backpatch-through: 9.4 Discussion: https://postgr.es/m/16143-0d861eb8688d3fef%40postgresql.org
2019-11-28Remove unnecessary clauses_attnums variableTomas Vondra
Commit c676e659b2 reworked how choose_best_statistics() picks the best extended statistics, but failed to remove clauses_attnums which is now unnecessary. So get rid of it and backpatch to 12, same as c676e659b2. Author: Tomas Vondra Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
2019-11-28Fix choose_best_statistics to check clauses individuallyTomas Vondra
When picking the best extended statistics object for a list of clauses, it's not enough to look at attnums extracted from the clause list as a whole. Consider for example this query with OR clauses: SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1) with a statistics defined on columns (a,b). Relying on attnums extracted from the whole OR clause, we'd consider the statistics usable. That does not work, as we see the conditions as a single OR-clause, referencing an attribute not covered by the statistic, leading to empty list of clauses to be estimated using the statistics and an assert failure. This changes choose_best_statistics to check which clauses are actually covered, and only using attributes from the fully covered ones. For the previous example this means the statistics object will not be considered as compatible with the OR-clause. Backpatch to 12, where MCVs were introduced. The issue does not affect older versions because functional dependencies don't handle OR clauses. Author: Tomas Vondra Reviewed-by: Dean Rasheed Reported-By: Manuel Rigger Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
2019-11-27Fix typo in comment.Etsuro Fujita