summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2025-07-23Fix build breakage on Solaris-alikes with late-model GCC.Tom Lane
Solaris has never bothered to add "const" to the second argument of PAM conversation procs, as all other Unixen did decades ago. This resulted in an "incompatible pointer" compiler warning when building --with-pam, but had no more serious effect than that, so we never did anything about it. However, as of GCC 14 the case is an error not warning by default. To complicate matters, recent OpenIndiana (and maybe illumos in general?) *does* supply the "const" by default, so we can't just assume that platforms using our solaris template need help. What we can do, short of building a configure-time probe, is to make solaris.h #define _PAM_LEGACY_NONCONST, which causes OpenIndiana's pam_appl.h to revert to the traditional definition, and hopefully will have no effect anywhere else. Then we can use that same symbol to control whether we include "const" in the declaration of pam_passwd_conv_proc(). Bug: #18995 Reported-by: Andrew Watkins <awatkins1966@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18995-82058da9ab4337a7@postgresql.org Backpatch-through: 13
2025-07-22ecpg: Fix NULL pointer dereference during connection lookupMichael Paquier
ECPGconnect() caches established connections to the server, supporting the case of a NULL connection name when a database name is not specified by its caller. A follow-up call to ECPGget_PGconn() to get an established connection from the cached set with a non-NULL name could cause a NULL pointer dereference if a NULL connection was listed in the cache and checked for a match. At least two connections are necessary to reproduce the issue: one with a NULL name and one with a non-NULL name. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://postgr.es/m/CAJ7c6TNvFTPUTZQuNAoqgzaSGz-iM4XR61D7vEj5PsQXwg2RyA@mail.gmail.com Backpatch-through: 13
2025-07-19Fix infinite wait when reading a partially written WAL recordAlexander Korotkov
If a crash occurs while writing a WAL record that spans multiple pages, the recovery process marks the page with the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag. However, logical decoding currently attempts to read the full WAL record based on its expected size before checking this flag, which can lead to an infinite wait if the remaining data is never written (e.g., no activity after crash). This patch updates the logic first to read the page header and check for the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag before attempting to reconstruct the full WAL record. If the flag is set, decoding correctly identifies the record as incomplete and avoids waiting for WAL data that will never arrive. Discussion: https://postgr.es/m/CAAKRu_ZCOzQpEumLFgG_%2Biw3FTa%2BhJ4SRpxzaQBYxxM_ZAzWcA%40mail.gmail.com Discussion: https://postgr.es/m/CALDaNm34m36PDHzsU_GdcNXU0gLTfFY5rzh9GSQv%3Dw6B%2BQVNRQ%40mail.gmail.com Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Backpatch-through: 13
2025-07-17Fix PQport to never return NULL unless the connection is NULL.Tom Lane
This is the documented behavior, and it worked that way before v10. However, addition of the connhost[] array created cases where conn->connhost[conn->whichhost].port is NULL. The rest of libpq is careful to substitute DEF_PGPORT[_STR] for a null or empty port string, but we failed to do so here, leading to possibly returning NULL. As of v18 that causes psql's \conninfo command to segfault. Older psql versions avoid that, but it's pretty likely that other clients have trouble with this, so we'd better back-patch the fix. In stable branches, just revert to our historical behavior of returning an empty string when there was no user-given port specification. However, it seems substantially more useful and indeed more correct to hand back DEF_PGPORT_STR in such cases, so let's make v18 and master do that. Author: Daniele Varrazzo <daniele.varrazzo@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+mi_8YTS8WPZPO0PAb2aaGLwHuQ0DEQRF0ZMnvWss4y9FwDYQ@mail.gmail.com Backpatch-through: 13
2025-07-17Remove assertion from PortalRunMultiÁlvaro Herrera
We have an assertion to ensure that a command tag has been assigned by the time we're done executing, but if we happen to execute a command with no queries, the assertion would fail. Per discussion, rather than contort things to get a tag assigned, just remove the assertion. Oversight in 2f9661311b83. That commit also retained a comment that explained logic that had been adjacent to it but diffused into various places, leaving none apt to keep part of the comment. Remove that part, and rewrite what remains for extra clarity. Bug: #18984 Backpatch-through: 13 Reported-by: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org
2025-07-16Fix dumping of comments on invalid constraints on domainsÁlvaro Herrera
We skip dumping constraints together with domains if they are invalid ('separate') so that they appear after data -- but their comments were dumped together with the domain definition, which in effect leads to the comment being dumped when the constraint does not yet exist. Delay them in the same way. Oversight in 7eca575d1c28; backpatch all the way back. Author: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxF_C2pe6J_+nPr6C5jf5rQnbYP8XOKr4HM8yHZtp2aQqQ@mail.gmail.com
2025-07-16psql: Fix note on project naming in output of \copyright.Nathan Bossart
This adjusts the wording to match the changes in commits 5987553fde, a233a603ba, and pgweb commit 2d764dbc08. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/aHVo791guQR6uqwT%40nathan Backpatch-through: 13
2025-07-15Silence uninitialized-value warnings in compareJsonbContainers().Tom Lane
Because not every path through JsonbIteratorNext() sets val->type, some compilers complain that compareJsonbContainers() is comparing possibly-uninitialized values. The paths that don't set it return WJB_DONE, WJB_END_ARRAY, or WJB_END_OBJECT, so it's clear by manual inspection that the "(ra == rb)" code path is safe, and indeed we aren't seeing warnings about that. But the (ra != rb) case is much less obviously safe. In Assert-enabled builds it seems that the asserts rejecting WJB_END_ARRAY and WJB_END_OBJECT persuade gcc 15.x not to warn, which makes little sense because it's impossible to believe that the compiler can prove of its own accord that ra/rb aren't WJB_DONE here. (In fact they never will be, so the code isn't wrong, but why is there no warning?) Without Asserts, the appearance of warnings is quite unsurprising. We discussed fixing this by converting those two Asserts into pg_assume, but that seems not very satisfactory when it's so unclear why the compiler is or isn't warning: the warning could easily reappear with some other compiler version. Let's fix it in a less magical, more future-proof way by changing JsonbIteratorNext() so that it always does set val->type. The cost of that should be pretty negligible, and it makes the function's API spec less squishy. Reported-by: Erik Rijkers <er@xs4all.nl> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/988bf1bc-3f1f-99f3-bf98-222f1cd9dc5e@xs4all.nl Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru Backpatch-through: 13
2025-07-11Fix inconsistent quoting of role names in ACLs.Tom Lane
getid() and putid(), which parse and deparse role names within ACL input/output, applied isalnum() to see if a character within a role name requires quoting. They did this even for non-ASCII characters, which is problematic because the results would depend on encoding, locale, and perhaps even platform. So it's possible that putid() could elect not to quote some string that, later in some other environment, getid() will decide is not a valid identifier, causing dump/reload or similar failures. To fix this in a way that won't risk interoperability problems with unpatched versions, make getid() treat any non-ASCII as a legitimate identifier character (hence not requiring quotes), while making putid() treat any non-ASCII as requiring quoting. We could remove the resulting excess quoting once we feel that no unpatched servers remain in the wild, but that'll be years. A lesser problem is that getid() did the wrong thing with an input consisting of just two double quotes (""). That has to represent an empty string, but getid() read it as a single double quote instead. The case cannot arise in the normal course of events, since we don't allow empty-string role names. But let's fix it while we're here. Although we've not heard field reports of problems with non-ASCII role names, there's clearly a hazard there, so back-patch to all supported versions. Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3792884.1751492172@sss.pgh.pa.us Backpatch-through: 13
2025-07-07Restore the ability to run pl/pgsql expression queries in parallel.Tom Lane
pl/pgsql's notion of an "expression" is very broad, encompassing any SQL SELECT query that returns a single column and no more than one row. So there are cases, for example evaluation of an aggregate function, where the query involves significant work and it'd be useful to run it with parallel workers. This used to be possible, but commits 3eea7a0c9 et al unintentionally disabled it. The simplest fix is to make exec_eval_expr() pass maxtuples = 0 rather than 2 to exec_run_select(). This avoids the new rule that we will never use parallelism when a nonzero "count" limit is passed to ExecutorRun(). (Note that the pre-3eea7a0c9 behavior was indeed unsafe, so reverting that rule is not in the cards.) The reason for passing 2 before was that exec_eval_expr() will throw an error if it gets more than one returned row, so we figured that as soon as we have two rows we know that will happen and we might as well stop running the query. That choice was cost-free when it was made; but disabling parallelism is far from cost-free, so now passing 2 amounts to optimizing a failure case at the expense of useful cases. An expression query that can return more than one row is certainly broken. People might now need to wait a bit longer to discover such breakage; but hopefully few will use enormously expensive cases as their first test of new pl/pgsql logic. Author: Dipesh Dhameliya <dipeshdhameliya125@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CABgZEgdfbnq9t6xXJnmXbChNTcWFjeM_6nuig41tm327gYi2ig@mail.gmail.com Backpatch-through: 13
2025-07-04Fix new pg_upgrade query not to rely on regnamespaceÁlvaro Herrera
That was invented in 9.5, and pg_upgrade claims to support back to 9.0. But we don't need that with a simple query change, tested by Tom Lane. Discussion: https://postgr.es/m/202507041645.afjl5rssvrgu@alvherre.pgsql
2025-07-04pg_upgrade: check for inconsistencies in not-null constraints w/inheritanceÁlvaro Herrera
With tables defined like this, CREATE TABLE ip (id int PRIMARY KEY); CREATE TABLE ic (id int) INHERITS (ip); ALTER TABLE ic ALTER id DROP NOT NULL; pg_upgrade fails during the schema restore phase due to this error: ERROR: column "id" in child table must be marked NOT NULL This can only be fixed by marking the child column as NOT NULL before the upgrade, which could take an arbitrary amount of time (because ic's data must be scanned). Have pg_upgrade's check mode warn if that condition is found, so that users know what to adjust before running the upgrade for real. Author: Ali Akbar <the.apaan@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CACQjQLoMsE+1pyLe98pi0KvPG2jQQ94LWJ+PTiLgVRK4B=i_jg@mail.gmail.com
2025-07-04Disable commit timestamps during bootstrapMichael Paquier
Attempting to use commit timestamps during bootstrapping leads to an assertion failure, that can be reached for example with an initdb -c that enables track_commit_timestamp. It makes little sense to register a commit timestamp for a BootstrapTransactionId, so let's disable the activation of the module in this case. This problem has been independently reported once by each author of this commit. Each author has proposed basically the same patch, relying on IsBootstrapProcessingMode() to skip the use of commit_ts during bootstrap. The test addition is a suggestion by me, and is applied down to v16. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Andy Fan <zhihuifan1213@163.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/OSCPR01MB14966FF9E4C4145F37B937E52F5102@OSCPR01MB14966.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/87plejmnpy.fsf@163.com Backpatch-through: 13
2025-07-03Obtain required table lock during cross-table updates, redux.Tom Lane
Commits 8319e5cb5 et al missed the fact that ATPostAlterTypeCleanup contains three calls to ATPostAlterTypeParse, and the other two also need protection against passing a relid that we don't yet have lock on. Add similar logic to those code paths, and add some test cases demonstrating the need for it. In v18 and master, the test cases demonstrate that there's a behavioral discrepancy between stored generated columns and virtual generated columns: we disallow changing the expression of a stored column if it's used in any composite-type columns, but not that of a virtual column. Since the expression isn't actually relevant to either sort of composite-type usage, this prohibition seems unnecessary; but changing it is a matter for separate discussion. For now we are just documenting the existing behavior. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: CACJufxGKJtGNRRSXfwMW9SqVOPEMdP17BJ7DsBf=tNsv9pWU9g@mail.gmail.com Backpatch-through: 13
2025-07-01Make safeguard against incorrect flags for fsync more portable.Tom Lane
The existing code assumed that O_RDONLY is defined as 0, but this is not required by POSIX and is not true on GNU Hurd. We can avoid the assumption by relying on O_ACCMODE to mask the fcntl() result. (Hopefully, all supported platforms define that.) Author: Michael Banck <mbanck@gmx.net> Co-authored-by: Samuel Thibault Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6862e8d1.050a0220.194b8d.76fa@mx.google.com Discussion: https://postgr.es/m/68480868.5d0a0220.1e214d.68a6@mx.google.com Backpatch-through: 13
2025-06-29Obtain required table lock during cross-table constraint updates.Tom Lane
Sometimes a table's constraint may depend on a column of another table, so that we have to update the constraint when changing the referenced column's type. We need to have lock on the constraint's table to do that. ATPostAlterTypeCleanup believed that this case was only possible for FOREIGN KEY constraints, but it's wrong at least for CHECK and EXCLUDE constraints; and in general, we'd probably need exclusive lock to alter any sort of constraint. So just remove the contype check and acquire lock for any other table. This prevents a "you don't have lock" assertion failure, though no ill effect is observed in production builds. We'll error out later anyway because we don't presently support physically altering column types within stored composite columns. But the catalog-munging is basically all there, so we may as well make that part work. Bug: #18970 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: jian he <jian.universality@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18970-a7d1cfe1f8d5d8d9@postgresql.org Backpatch-through: 13
2025-06-27Use correct DatumGet*() function in test_shm_mq_main().Nathan Bossart
This is purely cosmetic, as dsm_attach() interprets its argument as a dsm_handle (i.e., an unsigned integer), but we might as well fix it. Oversight in commit 4db3744f1f. Author: Jianghua Yang <yjhjstz@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAAZLFmRxkUD5jRs0W3K%3DUe4_ZS%2BRcAb0PCE1S0vVJBn3sWH2UQ%40mail.gmail.com Backpatch-through: 13
2025-06-25Avoid scribbling of VACUUM optionsMichael Paquier
This fixes two issues with the handling of VacuumParams in vacuum_rel(). This code path has the idea to change the passed-in pointer of VacuumParams for the "truncate" and "index_cleanup" options for the relation worked on, impacting the two following scenarios where incorrect options may be used because a VacuumParams pointer is shared across multiple relations: - Multiple relations in a single VACUUM command. - TOAST relations vacuumed with their main relation. The problem is avoided by providing to the two callers of vacuum_rel() copies of VacuumParams, before the pointer is updated for the "truncate" and "index_cleanup" options. The refactoring of the VACUUM option and parameters done in 0d831389749a did not introduce an issue, but it has encouraged the problem we are dealing with in this commit, with b84dbc8eb80b for "truncate" and a96c41feec6b for "index_cleanup" that have been added a couple of years after the initial refactoring. HEAD will be improved with a different patch that hardens the uses of VacuumParams across the tree. This cannot be backpatched as it introduces an ABI breakage. The backend portion of the patch has been authored by Nathan, while I have implemented the tests. The tests rely on injection points to check the option values, making them faster, more reliable than the tests originally proposed by Shihao, and they also provide more coverage. This part can only be backpatched down to v17. Reported-by: Shihao Zhong <zhong950419@gmail.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com Backpatch-through: 13
2025-06-21Doc: improve documentation about width_bucket().Tom Lane
Specify whether the bucket bounds are inclusive or exclusive, and improve some other vague language. Explain the behavior that occurs when the "low" bound is greater than the "high" bound. Make width_bucket_numeric's comment more like that for width_bucket_float8, in particular noting that infinite bounds are rejected (since they became possible in v14). Reported-by: Ben Peachey Higdon <bpeacheyhigdon@gmail.com> Author: Robert Treat <rob@xzilla.net> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/2BD74F86-5B89-4AC1-8F13-23CED3546AC1@gmail.com Backpatch-through: 13
2025-06-20Use SnapshotDirty when checking for conflicting index names.Tom Lane
While choosing an autogenerated name for an index, look for pre-existing relations using a SnapshotDirty snapshot, instead of the previous behavior that considered only committed-good pg_class rows. This allows us to detect and avoid conflicts against indexes that are still being built. It's still possible to fail due to a race condition, but the window is now just the amount of time that it takes DefineIndex to validate all its parameters, call smgrcreate(), and enter the index's pg_class row. Formerly the race window covered the entire time needed to create and fill an index, which could be very long if the table is large. Worse, if the conflicting index creation is part of a larger transaction, it wouldn't be visible till COMMIT. So this isn't a complete solution, but it should greatly ameliorate the problem, and the patch is simple enough to be back-patchable. It might at some point be useful to do the same for pg_constraint entries (cf. ChooseConstraintName, ConstraintNameExists, and related functions). However, in the absence of field complaints, I'll leave that alone for now. The relation-name test should be good enough for index-based constraints, while foreign-key constraints seem to be okay since they require exclusive locks to create. Bug: #18959 Reported-by: Maximilian Chrzan <maximilian.chrzan@here.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/18959-f63b53b864bb1417@postgresql.org Backpatch-through: 13
2025-06-16Fix re-distributing previously distributed invalidation messages during ↵Masahiko Sawada
logical decoding. Commit 4909b38af0 introduced logic to distribute invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. However, since each transaction distributes not only its original invalidation messages but also previously distributed messages to other transactions, this leads to an exponential increase in allocation request size for invalidation messages, ultimately causing memory allocation failure. This commit fixes this issue by tracking distributed invalidation messages separately per decoded transaction and not redistributing these messages to other in-progress transactions. The maximum size of distributed invalidation messages that one transaction can store is limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the distributed invalidation messages exceeds this threshold, we invalidate all caches in locations where distributed invalidation messages need to be executed. Back-patch to all supported versions where we introduced the fix by commit 4909b38af0. Note that this commit adds two new fields to ReorderBufferTXN to store the distributed transactions. This change breaks ABI compatibility in back branches, affecting third-party extensions that depend on the size of the ReorderBufferTXN struct, though this scenario seems unlikely. Additionally, it adds a new flag to the txn_flags field of ReorderBufferTXN to indicate distributed invalidation message overflow. This should not affect existing implementations, as it is unlikely that third-party extensions use unused bits in the txn_flags field. Bug: #18938 #18942 Author: vignesh C <vignesh21@gmail.com> Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Reported-by: John Hutchins <john.hutchins@wicourts.gov> Reported-by: Laurence Parry <greenreaper@hotmail.com> Reported-by: Max Madden <maxmmadden@gmail.com> Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com Backpatch-through: 13
2025-06-14Keep WAL segments by the flushed value of the slot's restart LSNAlexander Korotkov
The patch fixes the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The issue occurs when a slot is advanced after the start of the checkpoint and before old WAL segments are removed at the end of the checkpoint. The idea of the patch is to get the minimal restart_lsn at the beginning of checkpoint (or restart point) creation and use this value when calculating the oldest LSN for WAL segments removal at the end of checkpoint. This idea was proposed by Tomas Vondra in the discussion. Unlike 291221c46575, this fix doesn't affect ABI and is intended for back branches. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13
2025-06-11Make _bt_killitems drop pins it acquired itself.Peter Geoghegan
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40e. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40e introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40e. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
2025-06-10Don't reduce output request size on non-Unix-socket connections.Tom Lane
Traditionally, libpq's pqPutMsgEnd has rounded down the amount-to-send to be a multiple of 8K when it is eagerly writing some data. This still seems like a good idea when sending through a Unix socket, as pipes typically have a buffer size of 8K or some fraction/multiple of that. But there's not much argument for it on a TCP connection, since (a) standard MTU values are not commensurate with that, and (b) the kernel typically applies its own packet splitting/merging logic. Worse, our SSL and GSSAPI code paths both have API stipulations that if they fail to send all the data that was offered in the previous write attempt, we mustn't offer less data in the next attempt; else we may get "SSL error: bad length" or "GSSAPI caller failed to retransmit all data needing to be retried". The previous write attempt might've been pqFlush attempting to send everything in the buffer, so pqPutMsgEnd can't safely write less than the full buffer contents. (Well, we could add some more state to track exactly how much the previous write attempt was, but there's little value evident in such extra complication.) Hence, apply the round-down only on AF_UNIX sockets, where we never use SSL or GSSAPI. Interestingly, we had a very closely related bug report before, which I attempted to fix in commit d053a879b. But the test case we had then seemingly didn't trigger this pqFlush-then-pqPutMsgEnd scenario, or at least we failed to recognize this variant of the bug. Bug: #18907 Reported-by: Dorjpalam Batbaatar <htgn.dbat.95@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18907-d41b9bcf6f29edda@postgresql.org Backpatch-through: 13
2025-06-02Disallow "=" in names of reloptions and foreign-data options.Tom Lane
We store values for these options as array elements with the syntax "name=value", hence a name containing "=" confuses matters when it's time to read the array back in. Since validation of the options is often done (long) after this conversion to array format, that leads to confusing and off-point error messages. We can improve matters by rejecting names containing "=" up-front. (Probably a better design would have involved pairs of array elements, but it's too late now --- and anyway, there's no evident use-case for option names like this. We already reject such names in some other contexts such as GUCs.) Reported-by: Chapman Flack <jcflack@acm.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chapman Flack <jcflack@acm.org> Discussion: https://postgr.es/m/6830EB30.8090904@acm.org Backpatch-through: 13
2025-06-01Run pgindent on the previous commit.Tom Lane
Clean up after rearranging PG_TRY blocks. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2954090.1748723636@sss.pgh.pa.us Backpatch-through: 13
2025-06-01Fix edge-case resource leaks in PL/Python error reporting.Tom Lane
PLy_elog_impl and its subroutine PLy_traceback intended to avoid leaking any PyObject reference counts, but their coverage of the matter was sadly incomplete. In particular, out-of-memory errors in most of the string-construction subroutines could lead to reference count leaks, because those calls were outside the PG_TRY blocks responsible for dropping reference counts. Fix by (a) adjusting the scopes of the PG_TRY blocks, and (b) moving the responsibility for releasing the reference counts of the traceback-stack objects to PLy_elog_impl. This requires some additional "volatile" markers, but not too many. In passing, fix an ancient thinko: use of the "e_module_o" PyObject was guarded by "if (e_type_s)", where surely "if (e_module_o)" was meant. This would only have visible consequences if the "__name__" attribute were present but the "__module__" attribute wasn't, which apparently never happens; but someday it might. Rearranging the PG_TRY blocks requires indenting a fair amount of code one more tab stop, which I'll do separately for clarity. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2954090.1748723636@sss.pgh.pa.us Backpatch-through: 13
2025-05-30Ensure we have a snapshot when updating various system catalogs.Nathan Bossart
A few places that access system catalogs don't set up an active snapshot before potentially accessing their TOAST tables. To fix, push an active snapshot just before each section of code that might require accessing one of these TOAST tables, and pop it shortly afterwards. While at it, this commit adds some rather strict assertions in an attempt to prevent such issues in the future. Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST table in order to fix the same problem for that catalog. On the back-branches, those bugs are left in place. We cannot easily remove a catalog's TOAST table on released major versions, and only replication origins with extremely long names are affected. Given the low severity of the issue, fixing older versions doesn't seem worth the trouble of significantly modifying the patch. Also, on v13 and v14, the aforementioned strict assertions have been omitted because commit 2776922201, which added HaveRegisteredOrActiveSnapshot(), was not back-patched. While we could probably back-patch it now, I've opted against it because it seems unlikely that new TOAST snapshot issues will be introduced in the oldest supported versions. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan Backpatch-through: 13
2025-05-30Allow larger packets during GSSAPI authentication exchange.Tom Lane
Our GSSAPI code only allows packet sizes up to 16kB. However it emerges that during authentication, larger packets might be needed; various authorities suggest 48kB or 64kB as the maximum packet size. This limitation caused login failure for AD users who belong to many AD groups. To add insult to injury, we gave an unintelligible error message, typically "GSSAPI context establishment error: The routine must be called again to complete its function: Unknown error". As noted in code comments, the 16kB packet limit is effectively a protocol constant once we are doing normal data transmission: the GSSAPI code splits the data stream at those points, and if we change the limit then we will have cross-version compatibility problems due to the receiver's buffer being too small in some combinations. However, during the authentication exchange the packet sizes are not determined by us, but by the underlying GSSAPI library. So we might as well just try to send what the library tells us to. An unpatched recipient will fail on a packet larger than 16kB, but that's not worse than the sender failing without even trying. So this doesn't introduce any meaningful compatibility problem. We still need a buffer size limit, but we can easily make it be 64kB rather than 16kB until transport negotiation is complete. (Larger values were discussed, but don't seem likely to add anything.) Reported-by: Chris Gooch <cgooch@bamfunds.com> Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/DS0PR22MB5971A9C8A3F44BCC6293C4DABE99A@DS0PR22MB5971.namprd22.prod.outlook.com Backpatch-through: 13
2025-05-31Make XactLockTableWait() and ConditionalXactLockTableWait() interruptable more.Fujii Masao
Previously, XactLockTableWait() and ConditionalXactLockTableWait() could enter a non-interruptible loop when they successfully acquired a lock on a transaction but the transaction still appeared to be running. Since this loop continued until the transaction completed, it could result in long, uninterruptible waits. Although this scenario is generally unlikely since XactLockTableWait() and ConditionalXactLockTableWait() can basically acquire a transaction lock only when the transaction is not running, it can occur in a hot standby. In such cases, the transaction may still appear active due to the KnownAssignedXids list, even while no lock on the transaction exists. For example, this situation can happen when creating a logical replication slot on a standby. The cause of the non-interruptible loop was the absence of CHECK_FOR_INTERRUPTS() within it. This commit adds CHECK_FOR_INTERRUPTS() to the loop in both functions, ensuring they can be interrupted safely. Back-patch to all supported branches. Author: Kevin K Biju <kevinkbiju@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com Backpatch-through: 13
2025-05-28Adjust regex for test with opening parenthesis in character classesMichael Paquier
As written, the test was throwing an error because of an unbalanced parenthesis. The regex used in the test is adjusted to not fail and to test the case of an opening parenthesis in a character class after some nested square brackets. Oversight in d46911e584d4. Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at
2025-05-28Fix conversion of SIMILAR TO regexes for character classesMichael Paquier
The code that translates SIMILAR TO pattern matching expressions to POSIX-style regular expressions did not consider that square brackets can be nested. For example, in an expression like [[:alpha:]%_], the logic replaced the placeholders '_' and '%' but it should not. This commit fixes the conversion logic by tracking the nesting level of square brackets marking character class areas, while considering that in expressions like []] or [^]] the first closing square bracket is a regular character. Multiple tests are added to show how the conversions should or should not apply applied while in a character class area, with specific cases added for all the characters converted outside character classes like an opening parenthesis '(', dollar sign '$', etc. Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at Backpatch-through: 13
2025-05-19Fix deparsing FETCH FIRST <expr> ROWS WITH TIESHeikki Linnakangas
In the grammar, <expr> is a c_expr, which accepts only a limited set of simple constants and expressions without parens. The deparsing logic didn't quite match the grammar rule, and failed to use parens e.g. for "5::bigint". To fix, always surround the expression with parens. Would be nice to omit the parens in simple cases, but unfortunately it's non-trivial to detect such simple cases. Even if the expression is a simple literal 123 in the original query, after parse analysis it becomes a FuncExpr with COERCE_IMPLICIT_CAST rather than a simple Const. Reported-by: yonghao lee Backpatch-through: 13 Discussion: https://www.postgresql.org/message-id/18929-077d6b7093b176e2@postgresql.org
2025-05-19Don't retreat slot's confirmed_flush LSN.Amit Kapila
Prevent moving the confirmed_flush backwards, as this could lead to data duplication issues caused by replicating already replicated changes. This can happen when a client acknowledges an LSN it doesn't have to do anything for, and thus didn't store persistently. After a restart, the client can send the prior LSN that it stored persistently as an acknowledgement, but we need to ignore such an LSN to avoid retreating confirm_flush LSN. Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Tested-by: Nisha Moond <nisha.moond412@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57164AB5716AF2E477D53F6F9489A@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-05-18Make our usage of memset_s() conform strictly to the C11 standard.Tom Lane
Per the letter of the C11 standard, one must #define __STDC_WANT_LIB_EXT1__ as 1 before including <string.h> in order to have access to memset_s(). It appears that many platforms are lenient about this, because we weren't doing it and yet the code appeared to work anyway. But we now find that with -std=c11, macOS is strict and doesn't declare memset_s, leading to compile failures since we try to use it anyway. (Given the lack of prior reports, perhaps this is new behavior in the latest SDK? No matter, we're clearly in the wrong.) In addition to the immediate problem, which could be fixed merely by adding the needed #define to explicit_bzero.c, it seems possible that our configure-time probe for memset_s() could fail in case a platform implements the function in some odd way due to this spec requirement. This concern can be fixed in largely the same way that we dealt with strchrnul() in 6da2ba1d8: switch to using a declaration-based configure probe instead of a does-it-link probe. Back-patch to v13 where we started using memset_s(). Reported-by: Lakshmi Narayana Velayudam <dev.narayana.v@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAA4pTnLcKGG78xeOjiBr5yS7ZeE-Rh=FaFQQGOO=nPzA1L8yEA@mail.gmail.com Backpatch-through: 13
2025-05-11Fix comment of tsquerysend()Álvaro Herrera
The comment describes the order in which fields are sent, and it had one of the fields in the wrong place. This has been wrong since e6dbcb72fafa (2008), so backpatch all the way back. Author: Emre Hasegeli <emre@hasegeli.com> Discussion: https://postgr.es/m/CAE2gYzzf38bR_R=izhpMxAmqHXKeM5ajkmukh4mNs_oXfxcMCA@mail.gmail.com
2025-05-05With GB18030, prevent SIGSEGV from reading past end of allocation.Noah Misch
With GB18030 as source encoding, applications could crash the server via SQL functions convert() or convert_from(). Applications themselves could crash after passing unterminated GB18030 input to libpq functions PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or PQescapeString(). Extension code could crash by passing unterminated GB18030 input to jsonapi.h functions. All those functions have been intended to handle untrusted, unterminated input safely. A crash required allocating the input such that the last byte of the allocation was the last byte of a virtual memory page. Some malloc() implementations take measures against that, making the SIGSEGV hard to reach. Back-patch to v13 (all supported versions). Author: Noah Misch <noah@leadboat.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 13 Security: CVE-2025-4207
2025-05-05Refactor test_escape.c for additional ways of testing.Noah Misch
Start the file with static functions not specific to pe_test_vectors tests. This way, new tests can use them without disrupting the file's layout. Change report_result() PQExpBuffer arguments to plain strings. Back-patch to v13 (all supported versions), for the next commit. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 13 Security: CVE-2025-4207
2025-05-05Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 5385d44a7a0e6f93326380101d0605cf1ca78890
2025-05-02Handle self-referencing FKs correctly in partitioned tablesÁlvaro Herrera
For self-referencing foreign keys in partitioned tables, we weren't handling creation of pg_constraint rows during CREATE TABLE PARTITION AS as well as ALTER TABLE ATTACH PARTITION. This is an old bug -- mostly, we broke this in 614a406b4ff1 while trying to fix it (so 12.13, 13.9, 14.6 and 15.0 and up all behave incorrectly). This commit reverts part of that with additional fixes for full correctness, and installs more tests to verify the parts we broke, not just the catalog contents but also the user-visible behavior. Backpatch to all live branches. In branches 13 and 14, commit 46a8c27a7226 changed the behavior during DETACH to drop a FK constraint rather than trying to repair it, because the complete fix of repairing catalog constraints was problematic due to lack of previous fixes. For this reason, the test behavior in those branches is a bit different. However, as best as I can tell, the fix works correctly there. In release notes we have to recommend that all self-referencing foreign keys on partitioned tables be recreated if partitions have been created or attached after the FK was created, keeping in mind that violating rows might already be present on the referencing side. Reported-by: Guillaume Lelarge <guillaume@lelarge.info> Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com> Reported-by: Luca Vallisa <luca.vallisa@gmail.com> Discussion: https://postgr.es/m/CAECtzeWHCA+6tTcm2Oh2+g7fURUJpLZb-=pRXgeWJ-Pi+VU=_w@mail.gmail.com Discussion: https://postgr.es/m/18156-a44bc7096f0683e6@postgresql.org Discussion: https://postgr.es/m/CAAT=myvsiF-Attja5DcWoUWh21R12R-sfXECY2-3ynt8kaOqjw@mail.gmail.com
2025-05-01Add missing newlines to PQescapeInternal() messages pre-v16.Tom Lane
While back-patching 9f45e6a91, I neglected that the convention in pre-v16 libpq was to include a trailing newline in error message strings (since then, we add those separately). Add them now. Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/a9c837ad-d507-4607-94e4-c5743a8f49e0@eisentraut.org Backpatch-through: 13-15
2025-04-30Update time zone data files to tzdata release 2025b.Tom Lane
DST law changes in Chile: there is a new time zone America/Coyhaique for Chile's Aysén Region, to account for it changing to UTC-03 year-round and thus diverging from America/Santiago. Historical corrections for Iran. Backpatch-through: 13
2025-04-28Fix xmin advancement during fast_forward decoding.Amit Kapila
During logical decoding, we advance catalog_xmin of logical too early in fast_forward mode, resulting in required catalog data being removed by vacuum. This mode is normally used to advance the slot without processing the changes, but we still can't let the slot's xmin to advance to an incorrect value. Commit f49a80c481 fixed a similar issue where the logical slot's catalog_xmin was getting advanced prematurely during non-fast-forward mode. During xl_running_xacts processing, instead of directly advancing the slot's xmin to the oldest running xid in the record, it allowed the xmin to be held back for snapshots that can be used for not-yet-replayed transactions, as those might consider older txns as running too. However, it missed the fact that the same problem can happen during fast_forward mode decoding, as we won't build a base snapshot in that mode, and the future call to get_changes from the same slot can miss seeing the required catalog changes leading to incorrect reslts. This commit allows building the base snapshot even in fast_forward mode to prevent the early advancement of xmin. Reported-by: Amit Kapila <amit.kapila16@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-24Fix data loss in logical replication.Amit Kapila
This commit is a backpatch of commit 4909b38af0 for 13. Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take a strong lock on table happens concurrently to DMLs on the tables involved in the DDL. This happens because logical decoding doesn't distribute invalidations to concurrent transactions and those transactions use stale cache data to decode the changes. The problem becomes bigger because we keep using the stale cache even after those in-progress transactions are finished and skip the changes required to be sent to the client. This commit fixes the issue by distributing invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. This allows the necessary rebuild of the catalog cache when decoding new changes after concurrent DDL. The fix for 13 is different from what we did in branches 14 and above, such that for 13, the concurrent DDL changes (from DDL types mentioned earlier) will be visible for any newly started transactions. To make them visible in concurrent transactions, we need to introduce a new change type REORDER_BUFFER_CHANGE_INVALIDATION, already present in branches 14 and greater. We decided not to take the risk of a bigger change and fix the issue partially in 13. Reported-by: hubert depesz lubaczewski <depesz@depesz.com> Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com> Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com Discussion: https://postgr.es/m/CAD21AoAhU3kp8shYqP=ExiFDZ9sZxpFb5WzLa0p+vEe5j+7CWQ@mail.gmail.com
2025-04-20Test restartpoints in archive recovery.Noah Misch
v14 commit 1f95181b44c843729caaa688f74babe9403b5850 and its v13 equivalent caused timing-dependent failures in archive recovery, at restartpoints. The symptom was "invalid magic number 0000 in log segment X, offset 0", "unexpected pageaddr X in log segment Y, offset 0" [X < Y], or an assertion failure. Commit 3635a0a35aafd3bfa80b7a809bc6e91ccd36606a and predecessors back-patched v15 changes to fix that. This test reproduces the problem probabilistically, typically in less than 1000 iterations of the test. Hence, buildfarm and CI runs would have surfaced enough failures to get attention within a day. Reported-by: Arun Thirupathi <arunth@google.com> Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
2025-04-20Avoid ERROR at ON COMMIT DELETE ROWS after relhassubclass=f.Noah Misch
Commit 7102070329d8147246d2791321f9915c3b5abf31 fixed a similar bug, but it missed the case of database-wide ANALYZE ("use_own_xacts" mode). Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 changed consequences from silent discard of a pg_class stats (relpages et al.) update to ERROR "tuple to be updated was already modified". Losing a relpages update of an ON COMMIT DELETE ROWS table was negligible, but a COMMIT-time error isn't negligible. Back-patch to v13 (all supported versions). Reported-by: Richard Guo <guofenglinux@gmail.com Reported-by: Robins Tharakan <tharakan@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-XwMKMKJ_GT=p3_-_=j9rQSEs1FbDFUnW9zHuKPsPNEQ@mail.gmail.com Backpatch-through: 13
2025-04-16Fix pg_dump --clean with partitioned indexes.Tom Lane
We'd try to drop the partitions of a partitioned index separately, which is disallowed by the backend, leading to an error during restore. While the error is harmless, it causes problems if you try to use --single-transaction mode. Fortunately, there seems no need to do a DROP at all, since the partition will go away silently when we drop either the parent index or the partition's table. So just make the DROP conditional on not being a partition. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxF0QSdkjFKF4di-JGWN6CSdQYEAhGPmQJJCdkSZtd=oLg@mail.gmail.com Backpatch-through: 13
2025-04-12Fix GIN's shimTriConsistentFn to not corrupt its input.Tom Lane
Commit 0f21db36d made an assumption that GIN triConsistentFns would not modify their input entryRes[] arrays. But in fact, the "shim" triConsistentFn that we use for opclasses that don't supply their own did exactly that, potentially leading to wrong answers from a GIN index search. Through bad luck, none of the test cases that we have for such opclasses exposed the bug. One response to this could be that the assumption of consistency check functions not modifying entryRes[] arrays is a bad one, but it still seems reasonable to me. Notably, shimTriConsistentFn is itself assuming that with respect to the underlying boolean consistentFn, so it's sure being self-centered in supposing that it gets to do so. Fortunately, it's quite simple to fix shimTriConsistentFn to restore the entry-time state of entryRes[], so let's do that instead. This issue doesn't affect any core GIN opclasses, since they all supply their own triConsistentFns. It does affect contrib modules btree_gin, hstore, and intarray. Along the way, I (tgl) noticed that shimTriConsistentFn failed to pick up on a "recheck" flag returned by its first call to the boolean consistentFn. This may be only a latent problem, since it would be unlikely for a consistentFn to set recheck for the all-false case and not any other cases. (Indeed, none of our contrib modules do that.) Nonetheless, it's formally wrong. Reported-by: Vinod Sridharan <vsridh90@gmail.com> Author: Vinod Sridharan <vsridh90@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com Backpatch-through: 13
2025-04-11Fix race with synchronous_standby_names at startupMichael Paquier
synchronous_standby_names cannot be reloaded safely by backends, and the checkpointer is in charge of updating a state in shared memory if the GUC is enabled in WalSndCtl, to let the backends know if they should wait or not for a given LSN. This provides a strict control on the timing of the waiting queues if the GUC is enabled or disabled, then reloaded. The checkpointer is also in charge of waking up the backends that could be waiting for a LSN when the GUC is disabled. This logic had a race condition at startup, where it would be possible for backends to not wait for a LSN even if synchronous_standby_names is enabled. This would cause visibility issues with transactions that we should be waiting for but they were not. The problem lasts until the checkpointer does its initial update of the shared memory state when it loads synchronous_standby_names. In order to take care of this problem, the shared memory state in WalSndCtl is extended to detect if it has been initialized by the checkpointer, and not only check if synchronous_standby_names is defined. In WalSndCtlData, sync_standbys_defined is renamed to sync_standbys_status, a bits8 able to know about two states: - If the shared memory state has been initialized. This flag is set by the checkpointer at startup once, and never removed. - If synchronous_standby_names is known as defined in the shared memory state. This is the same as the previous sync_standbys_defined in WalSndCtl. This method gives a way for backends to decide what they should do until the shared memory area is initialized, and they now ultimately fall back to a check on the GUC value in this case, which is the best thing that can be done. Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by the checkpointer when this process starts, so the window is very narrow. It is possible to enlarge the problematic window by making the checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined() with a hardcoded sleep for example, and doing so has showed that a 2PC visibility test is indeed failing. On machines slow enough, this bug would cause spurious failures. In 17~, we have looked at the possibility of adding an injection point to have a reproducible test, but as the problematic window happens at early startup, we would need to invent a way to make an injection point optionally persistent across restarts when attached, something that would be fine for this case as it would involve the checkpointer. This issue is quite old, and can be reproduced on all the stable branches. Author: Melnikov Maksim <m.melnikov@postgrespro.ru> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru Backpatch-through: 13
2025-04-06Reset InstallXLogFileSegmentActive after walreceiver self-initiated exit.Noah Misch
After commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 added this flag, failure to reset it caused assertion failures. In non-assert builds, it made the system fail to achieve the objectives listed in that commit; chiefly, we might emit a spurious log message. Back-patch to v15, where that commit first appeared. Bharath Rupireddy and Kyotaro Horiguchi. Reviewed by Dilip Kumar, Nathan Bossart and Michael Paquier. Reported by Dilip Kumar. This commit has been applied as of b4f584f9d2a1 in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me. Tests have been conducted by the both of us. Discussion: https://postgr.es/m/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13