summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
23 hoursFix segfault from releasing locks in detached DSM segmentsorigin/REL_16_STABLEAmit Langote
If a FATAL error occurs while holding a lock in a DSM segment (such as a dshash lock) and the process is not in a transaction, a segmentation fault can occur during process exit. The problem sequence is: 1. Process acquires a lock in a DSM segment (e.g., via dshash) 2. FATAL error occurs outside transaction context 3. proc_exit() begins, calling before_shmem_exit callbacks 4. dsm_backend_shutdown() detaches all DSM segments 5. Later, on_shmem_exit callbacks run 6. ProcKill() calls LWLockReleaseAll() 7. Segfault: the lock being released is in unmapped memory This only manifests outside transaction contexts because AbortTransaction() calls LWLockReleaseAll() during transaction abort, releasing locks before DSM cleanup. Background workers and other non-transactional code paths are vulnerable. Fix by calling LWLockReleaseAll() unconditionally at the start of shmem_exit(), before any callbacks run. Releasing locks before callbacks prevents the segfault - locks must be released before dsm_backend_shutdown() detaches their memory. This is safe because after an error, held locks are protecting potentially inconsistent data anyway, and callbacks can acquire fresh locks if needed. Also add a comment noting that LWLockReleaseAll() must be safe to call before LWLock initialization (which it is, since num_held_lwlocks will be 0), plus an Assert for the post-condition. This fix aligns with the original design intent from commit 001a573a2, which noted that backends must clean up shared memory state (including releasing lwlocks) before unmapping dynamic shared memory segments. Reported-by: Rahila Syed <rahilasyed90@gmail.com> Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com Backpatch-through: 14
32 hoursFix 'unexpected data beyond EOF' on replica restartHeikki Linnakangas
On restart, a replica can fail with an error like 'unexpected data beyond EOF in block 200 of relation T/D/R'. These are the steps to reproduce it: - A relation has a size of 400 blocks. - Blocks 201 to 400 are empty. - Block 200 has two rows. - Blocks 100 to 199 are empty. - A restartpoint is done - Vacuum truncates the relation to 200 blocks - A FPW deletes a row in block 200 - A checkpoint is done - A FPW deletes the last row in block 200 - Vacuum truncates the relation to 100 blocks - The replica restarts When the replica restarts: - The relation on disk starts at 100 blocks, because all the truncations were applied before restart. - The first truncate to 200 blocks is replayed. It silently fails, but it will still (incorrectly!) update the cache size to 200 blocks - The first FPW on block 200 is applied. XLogReadBufferForRead relies on the cached size and incorrectly assumes that the page already exists in the file, and thus won't extend the relation. - The online checkpoint record is replayed, calling smgrdestroyall which causes the cached size to be discarded - The second FPW on block 200 is applied. This time, the detected size is 100 blocks, an extend is attempted. However, the block 200 is already present in the buffer cache due to the first FPW. This triggers the 'unexpected data beyond EOF'. To fix, update the cached size in SmgrRelation with the current size rather than the requested new size, when the requested new size is greater. Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com Backpatch-through: 14
36 hoursAdd check for invalid offset at multixid truncationHeikki Linnakangas
If a multixid with zero offset is left behind after a crash, and that multixid later becomes the oldest multixid, truncation might try to look up its offset and read the zero value. In the worst case, we might incorrectly use the zero offset to truncate valid SLRU segments that are still needed. I'm not sure if that can happen in practice, or if there are some other lower-level safeguards or incidental reasons that prevent the caller from passing an unwritten multixid as the oldest multi. But better safe than sorry, so let's add an explicit check for it. In stable branches, we should perhaps do the same check for 'oldestOffset', i.e. the offset of the old oldest multixid (in master, 'oldestOffset' is gone). But if the old oldest multixid has an invalid offset, the damage has been done already, and we would never advance past that point. It's not clear what we should do in that case. The check that this commit adds will prevent such an multixid with invalid offset from becoming the oldest multixid in the first place, which seems enough for now. Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi Backpatch-through: 14
8 daysFix possible incorrect column reference in ERROR messageDavid Rowley
When creating a partition for a RANGE partitioned table, the reporting of errors relating to converting the specified range values into constant values for the partition key's type could display the name of a previous partition key column when an earlier range was specified as MINVALUE or MAXVALUE. This was caused by the code not correctly incrementing the index that tracks which partition key the foreach loop was working on after processing MINVALUE/MAXVALUE ranges. Fix by using foreach_current_index() to ensure the index variable is always set to the List element being worked on. Author: myzhen <zhenmingyang@yeah.net> Reviewed-by: zhibin wang <killerwzb@gmail.com> Discussion: https://postgr.es/m/273cab52.978.19b96fc75e7.Coremail.zhenmingyang@yeah.net Backpatch-through: 14
9 daysPrevent invalidation of newly created replication slots.Amit Kapila
A race condition could cause a newly created replication slot to become invalidated between WAL reservation and a checkpoint. Previously, if the required WAL was removed, we retried the reservation process. However, the slot could still be invalidated before the retry if the WAL was not yet removed but the checkpoint advanced the redo pointer beyond the slot's intended restart LSN and computed the minimum LSN that needs to be preserved for the slots. The fix is to acquire an exclusive lock on ReplicationSlotAllocationLock during WAL reservation, and a shared lock during the minimum LSN calculation at checkpoints to serialize the process. This ensures that, if WAL reservation occurs first, the checkpoint waits until restart_lsn is updated before calculating the minimum LSN. If the checkpoint runs first, subsequent WAL reservations pick a position at or after the latest checkpoint's redo pointer. We used a similar fix in HEAD (via commit 006dd4b2e5) and 18. The difference is that in 17 and prior branches we need to additionally handle the race condition with slot's minimum LSN computation during checkpoints. Reported-by: suyu.cmj <mengjuan.cmj@alibaba-inc.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 14 Discussion: https://postgr.es/m/5e045179-236f-4f8f-84f1-0f2566ba784c.mengjuan.cmj@alibaba-inc.com
11 daysFix issue with EVENT TRIGGERS and ALTER PUBLICATIONDavid Rowley
When processing the "publish" options of an ALTER PUBLICATION command, we call SplitIdentifierString() to split the options into a List of strings. Since SplitIdentifierString() modifies the delimiter character and puts NULs in their place, this would overwrite the memory of the AlterPublicationStmt. Later in AlterPublicationOptions(), the modified AlterPublicationStmt is copied for event triggers, which would result in the event trigger only seeing the first "publish" option rather than all options that were specified in the command. To fix this, make a copy of the string before passing to SplitIdentifierString(). Here we also adjust a similar case in the pgoutput plugin. There's no known issues caused by SplitIdentifierString() here, so this is being done out of paranoia. Thanks to Henson Choi for putting together an example case showing the ALTER PUBLICATION issue. Author: sunil s <sunilfeb26@gmail.com> Reviewed-by: Henson Choi <assam258@gmail.com> Reviewed-by: zengman <zengman@halodbtech.com> Backpatch-through: 14
11 daysHonor GUC settings specified in CREATE SUBSCRIPTION CONNECTION.Fujii Masao
Prior to v15, GUC settings supplied in the CONNECTION clause of CREATE SUBSCRIPTION were correctly passed through to the publisher's walsender. For example: CREATE SUBSCRIPTION mysub CONNECTION 'options=''-c wal_sender_timeout=1000''' PUBLICATION ... would cause wal_sender_timeout to take effect on the publisher's walsender. However, commit f3d4019da5d changed the way logical replication connections are established, forcing the publisher's relevant GUC settings (datestyle, intervalstyle, extra_float_digits) to override those provided in the CONNECTION string. As a result, from v15 through v18, GUC settings in the CONNECTION string were always ignored. This regression prevented per-connection tuning of logical replication. For example, using a shorter timeout for walsender connecting to a nearby subscriber and a longer one for walsender connecting to a remote subscriber. This commit restores the intended behavior by ensuring that GUC settings in the CONNECTION string are again passed through and applied by the walsender, allowing per-connection configuration. Backpatch to v15, where the regression was introduced. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/CAHGQGwGYV+-abbKwdrM2UHUe-JYOFWmsrs6=QicyJO-j+-Widw@mail.gmail.com Backpatch-through: 15
2025-12-31jit: Fix jit_profiling_support when unavailable.Thomas Munro
jit_profiling_support=true captures profile data for Linux perf. On other platforms, LLVMCreatePerfJITEventListener() returns NULL and the attempt to register the listener would crash. Fix by ignoring the setting in that case. The documentation already says that it only has an effect if perf support is present, and we already did the same for older LLVM versions that lacked support. No field reports, unsurprisingly for an obscure developer-oriented setting. Noticed in passing while working on commit 1a28b4b4. Backpatch-through: 14 Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com
2025-12-30Fix a race condition in updating procArray->replication_slot_xmin.Masahiko Sawada
Previously, ReplicationSlotsComputeRequiredXmin() computed the oldest xmin across all slots without holding ProcArrayLock (when already_locked is false), acquiring the lock just before updating the replication slot xmin. This could lead to a race condition: if a backend created a new slot and updates the global replication slot xmin, another backend concurrently running ReplicationSlotsComputeRequiredXmin() could overwrite that update with an invalid or stale value. This happens because the concurrent backend might have computed the aggregate xmin before the new slot was accounted for, but applied the update after the new slot had already updated the global value. In the reported failure, a walsender for an apply worker computed InvalidTransactionId as the oldest xmin and overwrote a valid replication slot xmin value computed by a walsender for a tablesync worker. Consequently, the tablesync worker computed a transaction ID via GetOldestSafeDecodingTransactionId() effectively without considering the replication slot xmin. This led to the error "cannot build an initial slot snapshot as oldest safe xid %u follows snapshot's xmin %u", which was an assertion failure prior to commit 240e0dbacd3. To fix this, we acquire ReplicationSlotControlLock in exclusive mode during slot creation to perform the initial update of the slot xmin. In ReplicationSlotsComputeRequiredXmin(), we hold ReplicationSlotControlLock in shared mode until the global slot xmin is updated in ProcArraySetReplicationSlotXmin(). This prevents concurrent computations and updates of the global xmin by other backends during the initial slot xmin update process, while still permitting concurrent calls to ReplicationSlotsComputeRequiredXmin(). Backpatch to all supported versions. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Pradeep Kumar <spradeepkumar29@gmail.com> Reviewed-by: Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com Backpatch-through: 14
2025-12-30jit: Remove -Wno-deprecated-declarations in 18+.Thomas Munro
REL_18_STABLE and master have commit ee485912, so they always use the newer LLVM opaque pointer functions. Drop -Wno-deprecated-declarations (commit a56e7b660) for code under jit/llvm in those branches, to catch any new deprecation warnings that arrive in future version of LLVM. Older branches continued to use functions marked deprecated in LLVM 14 and 15 (ie switched to the newer functions only for LLVM 16+), as a precaution against unforeseen compatibility problems with bitcode already shipped. In those branches, the comment about warning suppression is updated to explain that situation better. In theory we could suppress warnings only for LLVM 14 and 15 specifically, but that isn't done here. Backpatch-through: 14 Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1407185.1766682319%40sss.pgh.pa.us
2025-12-27Fix pg_stat_get_backend_activity() to use multi-byte truncated resultMichael Paquier
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure that the reported query string is correctly truncated when it finishes with an incomplete multi-byte sequence. However, the result returned by the function was not what pgstat_clip_activity() generated, but the non-truncated, original, contents from PgBackendStatus.st_activity_raw. Oversight in 54b6cd589ac2, so backpatch all the way down. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com Backpatch-through: 14
2025-12-24Don't advance origin during apply failure.Amit Kapila
The logical replication parallel apply worker could incorrectly advance the origin progress during an error or failed apply. This behavior risks transaction loss because such transactions will not be resent by the server. Commit 3f28b2fcac addressed a similar issue for both the apply worker and the table sync worker by registering a before_shmem_exit callback to reset origin information. This prevents the worker from advancing the origin during transaction abortion on shutdown. This patch applies the same fix to the parallel apply worker, ensuring consistent behavior across all worker types. As with 3f28b2fcac, we are backpatching through version 16, since parallel apply mode was introduced there and the issue only occurs when changes are applied before the transaction end record (COMMIT or ABORT) is received. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16 Discussion: https://postgr.es/m/TY4PR01MB169078771FB31B395AB496A6B94B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
2025-12-23Fix bug in following update chain when locking a heap tupleHeikki Linnakangas
After waiting for a concurrent updater to finish, heap_lock_tuple() followed the update chain to lock all tuple versions. However, when stepping from the initial tuple to the next one, it failed to check that the next tuple's XMIN matches the initial tuple's XMAX. That's an important check whenever following an update chain, and the recursive part that follows the chain did it, but the initial step missed it. Without the check, if the updating transaction aborts, the updated tuple is vacuumed away and replaced by an unrelated tuple, the unrelated tuple might get incorrectly locked. Author: Jasper Smit <jasper.smit@servicenow.com> Discussion: https://www.postgresql.org/message-id/CAOG+RQ74x0q=kgBBQ=mezuvOeZBfSxM1qu_o0V28bwDz3dHxLw@mail.gmail.com Backpatch-through: 14
2025-12-23Fix orphaned origin in shared memory after DROP SUBSCRIPTIONMichael Paquier
Since ce0fdbfe9722, a replication slot and an origin are created by each tablesync worker, whose information is stored in both a catalog and shared memory (once the origin is set up in the latter case). The transaction where the origin is created is the same as the one that runs the initial COPY, with the catalog state of the origin becoming visible for other sessions only once the COPY transaction has committed. The catalog state is coupled with a state in shared memory, initialized at the same time as the origin created in the catalogs. Note that the transaction doing the initial data sync can take a long time, time that depends on the amount of data to transfer from a publication node to its subscriber node. Now, when a DROP SUBSCRIPTION is executed, all its workers are stopped with the origins removed. The removal of each origin relies on a catalog lookup. A worker still running the initial COPY would fail its transaction, with the catalog state of the origin rolled back while the shared memory state remains around. The session running the DROP SUBSCRIPTION should be in charge of cleaning up the catalog and the shared memory state, but as there is no data in the catalogs the shared memory state is not removed. This issue would leave orphaned origin data in shared memory, leading to a confusing state as it would still show up in pg_replication_origin_status. Note that this shared memory data is sticky, being flushed on disk in replorigin_checkpoint at checkpoint. This prevents other origins from reusing a slot position in the shared memory data. To address this problem, the commit moves the creation of the origin at the end of the transaction that precedes the one executing the initial COPY, making the origin immediately visible in the catalogs for other sessions, giving DROP SUBSCRIPTION a way to know about it. A different solution would have been to clean up the shared memory state using an abort callback within the tablesync worker. The solution of this commit is more consistent with the apply worker that creates an origin in a short transaction. A test is added in the subscription test 004_sync.pl, which was able to display the problem. The test fails when this commit is reverted. Reported-by: Tenglong Gu <brucegu@amazon.com> Reported-by: Daisuke Higuchi <higudai@amazon.com> Analyzed-by: Michael Paquier <michael@paquier.xyz> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/aUTekQTg4OYnw-Co@paquier.xyz Backpatch-through: 14
2025-12-19Add guard to prevent recursive memory context logging.Fujii Masao
Previously, if memory context logging was triggered repeatedly and rapidly while a previous request was still being processed, it could result in recursive calls to ProcessLogMemoryContextInterrupt(). This could lead to infinite recursion and potentially crash the process. This commit adds a guard to prevent such recursion. If ProcessLogMemoryContextInterrupt() is already in progress and logging memory contexts, subsequent calls will exit immediately, avoiding unintended recursive calls. While this scenario is unlikely in practice, it's not impossible. This change adds a safety check to prevent such failures. Back-patch to v14, where memory context logging was introduced. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com> Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com Backpatch-through: 14
2025-12-18Do not emit WAL for unlogged BRIN indexesHeikki Linnakangas
Operations on unlogged relations should not be WAL-logged. The brin_initialize_empty_new_buffer() function didn't get the memo. The function is only called when a concurrent update to a brin page uses up space that we're just about to insert to, which makes it pretty hard to hit. If you do manage to hit it, a full-page WAL record is erroneously emitted for the unlogged index. If you then crash, crash recovery will fail on that record with an error like this: FATAL: could not create file "base/5/32819": File exists Author: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/CALdSSPhpZXVFnWjwEBNcySx_vXtXHwB2g99gE6rK0uRJm-3GgQ@mail.gmail.com Backpatch-through: 14
2025-12-16Assert lack of hazardous buffer locks before possible catalog read.Noah Misch
Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind, which existed in all branches for six days before detection. While the probability of reaching the trouble was low, the disruption was extreme. No new backends could start, and service restoration needed an immediate shutdown. Hence, add this to catch the next bug like it. The new check in RelationIdGetRelation() suffices to make autovacuum detect the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit 0bada39. This also checks in a number of similar places. It replaces each Assert(IsTransactionState()) that pertained to a conditional catalog read. Back-patch to v14 - v17. This a back-patch of commit f4ece891fc2f3f96f0571720a1ae30db8030681b (from before v18 branched) to all supported branches, to accompany the back-patch of commits 243e9b4 and 0bada39. For catalog indexes, the bttextcmp() behavior that motivated IsCatalogTextUniqueIndexOid() was v18-specific. Hence, this back-patch doesn't need that or its correction from commit 4a4ee0c2c1e53401924101945ac3d517c0a8a559. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com Backpatch-through: 14-17
2025-12-16WAL-log inplace update before revealing it to other sessions.Noah Misch
A buffer lock won't stop a reader having already checked tuple visibility. If a vac_update_datfrozenid() and then a crash happened during inplace update of a relfrozenxid value, datfrozenxid could overtake relfrozenxid. That could lead to "could not access status of transaction" errors. Back-patch to v14 - v17. This is a back-patch of commits: - 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 (main change, on master, before v18 branched) - 818013665259d4242ba641aad705ebe5a3e2db8e (defect fix, on master, before v18 branched) It reverses commit bc6bad88572501aecaa2ac5d4bc900ac0fd457d5, my revert of the original back-patch. In v14, this also back-patches the assertion removal from commit 7fcf2faf9c7dd473208fd6d5565f88d7f733782b. Discussion: https://postgr.es/m/20240620012908.92.nmisch@google.com Backpatch-through: 14-17
2025-12-16For inplace update, send nontransactional invalidations.Noah Misch
The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v14 - v17. This is a back-patch of commits: - 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 (main change, on master, before v18 branched) - 0bada39c83a150079567a6e97b1a25a198f30ea3 (defect fix, on master, before v18 branched) - bae8ca82fd00603ebafa0658640d6e4dfe20af92 (cosmetics from post-commit review, on REL_18_STABLE) It reverses commit c1099dd745b0135960895caa8892a1873ac6cbe5, my revert of the original back-patch of 243e9b4. This back-patch omits the non-comment heap_decode() changes. I find those changes removed harmless code that was last necessary in v13. See discussion thread for details. The back branches aren't the place to remove such code. Like the original back-patch, this doesn't change WAL, because these branches use end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Expect ".abi-compliance-history" edits to follow. Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions) Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions) Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com Backpatch-through: 14-17
2025-12-16Reorder two functions in inval.cMichael Paquier
This file separates public and static functions with a separator comment, but two routines were not defined in a location reflecting that, so reorder them. Back-patch commit c2bdd2c5b1d48a7e39e1a8d5e1d90b731b53c4c9 to v15 - v16. This avoids merge conflicts in the next commit, which modifies a function this moved. Exclude v14, which is so different that the merge conflict savings would be immaterial. Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAJ7c6TMX2dd0g91UKvcC+CVygKQYJkKJq1+ZzT4rOK42+b53=w@mail.gmail.com Backpatch-through: 15-16
2025-12-16Switch memory contexts in ReinitializeParallelDSM.Robert Haas
We already do this in CreateParallelContext, InitializeParallelDSM, and LaunchParallelWorkers. I suspect the reason why the matching logic was omitted from ReinitializeParallelDSM is that I failed to realize that any memory allocation was happening here -- but shm_mq_attach does allocate, which could result in a shm_mq_handle being allocated in a shorter-lived context than the ParallelContext which points to it. That could result in a crash if the shorter-lived context is freed before the parallel context is destroyed. As far as I am currently aware, there is no way to reach a crash using only code that is present in core PostgreSQL, but extensions could potentially trip over this. Fixing this in the back-branches appears low-risk, so back-patch to all supported versions. Author: Jakub Wartak <jakub.wartak@enterprisedb.com> Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Backpatch-through: 14 Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
2025-12-16Fail recovery when missing redo checkpoint record without backup_labelMichael Paquier
This commit adds an extra check at the beginning of recovery to ensure that the redo record of a checkpoint exists before attempting WAL replay, logging a PANIC if the redo record referenced by the checkpoint record could not be found. This is the same level of failure as when a checkpoint record is missing. This check is added when a cluster is started without a backup_label, after retrieving its checkpoint record. The redo LSN used for the check is retrieved from the checkpoint record successfully read. In the case where a backup_label exists, the startup process already fails if the redo record cannot be found after reading a checkpoint record at the beginning of recovery. Previously, the presence of the redo record was not checked. If the redo and checkpoint records were located on different WAL segments, it would be possible to miss a entire range of WAL records that should have been replayed but were just ignored. The consequences of missing the redo record depend on the version dealt with, these becoming worse the older the version used: - On HEAD, v18 and v17, recovery fails with a pointer dereference at the beginning of the redo loop, as the redo record is expected but cannot be found. These versions are good students, because we detect a failure before doing anything, even if the failure is misleading in the shape of a segmentation fault, giving no information that the redo record is missing. - In v16 and v15, problems show at the end of recovery within FinishWalRecovery(), the startup process using a buggy LSN to decide from where to start writing WAL. The cluster gets corrupted, still it is noisy about it. - v14 and older versions are worse: a cluster gets corrupted but it is entirely silent about the matter. The redo record missing causes the startup process to skip entirely recovery, because a missing record is the same as not redo being required at all. This leads to data loss, as everything is missed between the redo record and the checkpoint record. Note that I have tested that down to 9.4, reproducing the issue with a version of the author's reproducer slightly modified. The code is wrong since at least 9.2, but I did not look at the exact point of origin. This problem has been found by debugging a cluster where the WAL segment including the redo segment was missing due to an operator error, leading to a crash, based on an investigation in v15. Requesting archive recovery with the creation of a recovery.signal or a standby.signal even without a backup_label would mitigate the issue: if the record cannot be found in pg_wal/, the missing segment can be retrieved with a restore_command when checking that the redo record exists. This was already the case without this commit, where recovery would re-fetch the WAL segment that includes the redo record. The check introduced by this commit makes the segment to be retrieved earlier to make sure that the redo record can be found. On HEAD, the code will be slightly changed in a follow-up commit to not rely on a PANIC, to include a test able to emulate the original problem. This is a minimal backpatchable fix, kept separated for clarity. Reported-by: Andres Freund <andres@anarazel.de> Analyzed-by: Andres Freund <andres@anarazel.de> Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com Backpatch-through: 14
2025-12-15Clarify comment on multixid offset wraparound checkHeikki Linnakangas
Coverity complained that offset cannot be 0 here because there's an explicit check for "offset == 0" earlier in the function, but it didn't see the possibility that offset could've wrapped around to 0. The code is correct, but clarify the comment about it. The same code exists in backbranches in the server GetMultiXactIdMembers() function and in 'master' in the pg_upgrade GetOldMultiXactIdSingleMember function. In backbranches Coverity didn't complain about it because the check was merely an assertion, but change the comment in all supported branches for consistency. Per Tom Lane's suggestion. Discussion: https://www.postgresql.org/message-id/1827755.1765752936@sss.pgh.pa.us
2025-12-11Fix allocation formula in llvmjit_expr.cMichael Paquier
An array of LLVMBasicBlockRef is allocated with the size used for an element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef". LLVMBasicBlockRef is a type that refers to a pointer, so this did not directly cause a problem because both should have the same size, still it is incorrect. This issue has been spotted while reviewing a different patch, and exists since 2a0faed9d702, so backpatch all the way down. Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com Backpatch-through: 14
2025-12-09Doc: fix typo in hash index documentationDavid Rowley
Plus a similar fix to the README. Backpatch as far back as the sgml issue exists. The README issue does exist in v14, but that seems unlikely to harm anyone. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com Backpatch-through: 15
2025-12-05Fix setting next multixid's offset at offset wraparoundHeikki Linnakangas
In commit 789d65364c, we started updating the next multixid's offset too when recording a multixid, so that it can always be used to calculate the number of members. I got it wrong at offset wraparound: we need to skip over offset 0. Fix that. Discussion: https://www.postgresql.org/message-id/d9996478-389a-4340-8735-bfad456b313c@iki.fi Backpatch-through: 14
2025-12-03Set next multixid's offset when creating a new multixidHeikki Linnakangas
With this commit, the next multixid's offset will always be set on the offsets page, by the time that a backend might try to read it, so we no longer need the waiting mechanism with the condition variable. In other words, this eliminates "corner case 2" mentioned in the comments. The waiting mechanism was broken in a few scenarios: - When nextMulti was advanced without WAL-logging the next multixid. For example, if a later multixid was already assigned and WAL-logged before the previous one was WAL-logged, and then the server crashed. In that case the next offset would never be set in the offsets SLRU, and a query trying to read it would get stuck waiting for it. Same thing could happen if pg_resetwal was used to forcibly advance nextMulti. - In hot standby mode, a deadlock could happen where one backend waits for the next multixid assignment record, but WAL replay is not advancing because of a recovery conflict with the waiting backend. The old TAP test used carefully placed injection points to exercise the old waiting code, but now that the waiting code is gone, much of the old test is no longer relevant. Rewrite the test to reproduce the IPC/MultixactCreation hang after crash recovery instead, and to verify that previously recorded multixids stay readable. Backpatch to all supported versions. In back-branches, we still need to be able to read WAL that was generated before this fix, so in the back-branches this includes a hack to initialize the next offsets page when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the WAL is not compatible. Author: Andrey Borodin <amborodin@acm.org> Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru Backpatch-through: 14
2025-11-29Avoid rewriting data-modifying CTEs more than once.Dean Rasheed
Formerly, when updating an auto-updatable view, or a relation with rules, if the original query had any data-modifying CTEs, the rewriter would rewrite those CTEs multiple times as RewriteQuery() recursed into the product queries. In most cases that was harmless, because RewriteQuery() is mostly idempotent. However, if the CTE involved updating an always-generated column, it would trigger an error because any subsequent rewrite would appear to be attempting to assign a non-default value to the always-generated column. This could perhaps be fixed by attempting to make RewriteQuery() fully idempotent, but that looks quite tricky to achieve, and would probably be quite fragile, given that more generated-column-type features might be added in the future. Instead, fix by arranging for RewriteQuery() to rewrite each CTE exactly once (by tracking the number of CTEs already rewritten as it recurses). This has the advantage of being simpler and more efficient, but it does make RewriteQuery() dependent on the order in which rewriteRuleAction() joins the CTE lists from the original query and the rule action, so care must be taken if that is ever changed. Reported-by: Bernice Southey <bernice.southey@gmail.com> Author: Bernice Southey <bernice.southey@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/CAEDh4nyD6MSH9bROhsOsuTqGAv_QceU_GDvN9WcHLtZTCYM1kA@mail.gmail.com Backpatch-through: 14
2025-11-27Allow indexscans on partial hash indexes with implied quals.Tom Lane
Normally, if a WHERE clause is implied by the predicate of a partial index, we drop that clause from the set of quals used with the index, since it's redundant to test it if we're scanning that index. However, if it's a hash index (or any !amoptionalkey index), this could result in dropping all available quals for the index's first key, preventing us from generating an indexscan. It's fair to question the practical usefulness of this case. Since hash only supports equality quals, the situation could only arise if the index's predicate is "WHERE indexkey = constant", implying that the index contains only one hash value, which would make hash a really poor choice of index type. However, perhaps there are other !amoptionalkey index AMs out there with which such cases are more plausible. To fix, just don't filter the candidate indexquals this way if the index is !amoptionalkey. That's a bit hokey because it may result in testing quals we didn't need to test, but to do it more accurately we'd have to redundantly identify which candidate quals are actually usable with the index, something we don't know at this early stage of planning. Doesn't seem worth the effort. Reported-by: Sergei Glukhov <s.glukhov@postgrespro.ru> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/e200bf38-6b45-446a-83fd-48617211feff@postgrespro.ru Backpatch-through: 14
2025-11-24lwlock: Fix, currently harmless, bug in LWLockWakeup()Andres Freund
Accidentally the code in LWLockWakeup() checked the list of to-be-woken up processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that HAS_WAITERS would not get unset immediately, but only during the next, unnecessary, call to LWLockWakeup(). Luckily, as the code stands, this is just a small efficiency issue. However, if there were (as in a patch of mine) a case in which LWLockWakeup() would not find any backend to wake, despite the wait list not being empty, we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging. While the consequences in the backbranches are limited, the code as-is confusing, and it is possible that there are workloads where the additional wait list lock acquisitions hurt, therefore backpatch. Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Backpatch-through: 14
2025-11-22jit: Adjust AArch64-only code for LLVM 21.Thomas Munro
LLVM 21 changed the arguments of RTDyldObjectLinkingLayer's constructor, breaking compilation with the backported SectionMemoryManager from commit 9044fc1d. https://github.com/llvm/llvm-project/commit/cd585864c0bbbd74ed2a2b1ccc191eed4d1c8f90 Backpatch-through: 14 Author: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/d25e6e4a-d1b4-84d3-2f8a-6c45b975f53d%40applied-asynchrony.com
2025-11-18Don't allow CTEs to determine semantic levels of aggregates.Tom Lane
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in sub-selects within aggregate functions to affect the semantic levels assigned to such aggregates. It turns out this broke some related cases, leading to assertion failures or strange planner errors such as "unexpected outer reference in CTE query". After experimenting with some alternative rules for assigning the semantic level in such cases, we've come to the conclusion that changing the level is more likely to break things than be helpful. Therefore, this patch undoes what b0cc0a71e changed, and instead installs logic to throw an error if there is any reference to a CTE that's below the semantic level that standard SQL rules would assign to the aggregate based on its contained Var and Aggref nodes. (The SQL standard disallows sub-selects within aggregate functions, so it can't reach the troublesome case and hence has no rule for what to do.) Perhaps someone will come along with a legitimate query that this logic rejects, and if so probably the example will help us craft a level-adjustment rule that works better than what b0cc0a71e did. I'm not holding my breath for that though, because the previous logic had been there for a very long time before bug #19055 without complaints, and that bug report sure looks to have originated from fuzzing not from real usage. Like b0cc0a71e, back-patch to all supported branches, though sadly that no longer includes v13. Bug: #19106 Reported-by: Kamil Monicz <kamil@monicz.dev> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org Backpatch-through: 14
2025-11-17Define PS_USE_CLOBBER_ARGV on GNU/Hurd.Thomas Munro
Until d2ea2d310dfdc40328aca5b6c52225de78432e01, the PS_USE_PS_STRINGS option was used on the GNU/Hurd. As this option got removed and PS_USE_CLOBBER_ARGV appears to work fine nowadays on the Hurd, define this one to re-enable process title changes on this platform. In the 14 and 15 branches, the existing test for __hurd__ (added 25 years ago by commit 209aa77d, removed in 16 by the above commit) is left unchanged for now as it was activating slightly different code paths and would need investigation by a Hurd user. Author: Michael Banck <mbanck@debian.org> Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com Backpatch-through: 16
2025-11-14Add note about CreateStatistics()'s selective use of check_rights.Nathan Bossart
Commit 5e4fcbe531 added a check_rights parameter to this function for use by ALTER TABLE commands that re-create statistics objects. However, we intentionally ignore check_rights when verifying relation ownership because this function's lookup could return a different answer than the caller's. This commit adds a note to this effect so that we remember it down the road. Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 14
2025-11-12Clear 'xid' in dummy async notify entries written to fill up pagesHeikki Linnakangas
Before we started to freeze async notify entries (commit 8eeb4a0f7c), no one looked at the 'xid' on an entry with invalid 'dboid'. But now we might actually need to freeze it later. Initialize them with InvalidTransactionId to begin with, to avoid that work later. Álvaro pointed this out in review of commit 8eeb4a0f7c, but I forgot to include this change there. Author: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/202511071410.52ll56eyixx7@alvherre.pgsql Backpatch-through: 14
2025-11-12Fix remaining race condition with CLOG truncation and LISTEN/NOTIFYHeikki Linnakangas
Previous commit fixed a bug where VACUUM would truncate the CLOG that's still needed to check the commit status of XIDs in the async notify queue, but as mentioned in the commit message, it wasn't a full fix. If a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, vacuum can concurrently truncate the CLOG covering those XIDs, and the backend still gets an error when it calls TransactionIdDidCommit() on those XIDs in the local copy. This commit fixes that race condition. To fix, hold the SLRU bank lock across the TransactionIdDidCommit() calls in NOTIFY processing. Per Tom Lane's idea. Backpatch to all supported versions. Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://www.postgresql.org/message-id/2759499.1761756503@sss.pgh.pa.us Backpatch-through: 14
2025-11-12Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFYHeikki Linnakangas
The async notification queue contains the XID of the sender, and when processing notifications we call TransactionIdDidCommit() on the XID. But we had no safeguards to prevent the CLOG segments containing those XIDs from being truncated away. As a result, if a backend didn't for some reason process its notifications for a long time, or when a new backend issued LISTEN, you could get an error like: test=# listen c21; ERROR: 58P01: could not access status of transaction 14279685 DETAIL: Could not open file "pg_xact/000D": No such file or directory. LOCATION: SlruReportIOError, slru.c:1087 To fix, make VACUUM "freeze" the XIDs in the async notification queue before truncating the CLOG. Old XIDs are replaced with FrozenTransactionId or InvalidTransactionId. Note: This commit is not a full fix. A race condition remains, where a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, while vacuum concurrently truncates the CLOG covering those XIDs. When the backend then calls TransactionIdDidCommit() on those XIDs from the local copy, you still get the error. The next commit will fix that remaining race condition. This was first reported by Sergey Zhuravlev in 2021, with many other people hitting the same issue later. Thanks to: - Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink for investigating and providing reproducable test cases, - Matheus Alcantara and Arseniy Mukhin for review and earlier proposed patches to fix this, - Álvaro Herrera and Masahiko Sawada for reviews, - Yura Sokolov aka funny-falcon for the idea of marking transactions as committed in the notification queue, and - Joel Jacobson for the final patch version. I hope I didn't forget anyone. Backpatch to all supported versions. I believe the bug goes back all the way to commit d1e027221d, which introduced the SLRU-based async notification queue. Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com Backpatch-through: 14
2025-11-12Escalate ERRORs during async notify processing to FATALHeikki Linnakangas
Previously, if async notify processing encountered an error, we would report the error to the client and advance our read position past the offending entry to prevent trying to process it over and over again. Trying to continue after an error has a few problems however: - We have no way of telling the client that a notification was lost. They get an ERROR, but that doesn't tell you much. As such, it's not clear if keeping the connection alive after losing a notification is a good thing. Depending on the application logic, missing a notification could cause the application to get stuck waiting, for example. - If the connection is idle, PqCommReadingMsg is set and any ERROR is turned into FATAL anyway. - We bailed out of the notification processing loop on first error without processing any subsequent notifications. The subsequent notifications would not be processed until another notify interrupt arrives. For example, if there were two notifications pending, and processing the first one caused an ERROR, the second notification would not be processed until someone sent a new NOTIFY. This commit changes the behavior so that any ERROR while processing async notifications is turned into FATAL, causing the client connection to be terminated. That makes the behavior more consistent as that's what happened in idle state already, and terminating the connection is a clear signal to the application that it might've missed some notifications. The reason to do this now is that the next commits will change the notification processing code in a way that would make it harder to skip over just the offending notification entry on error. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://www.postgresql.org/message-id/fedbd908-4571-4bbe-b48e-63bfdcc38f64@iki.fi Backpatch-through: 14
2025-11-12Fix range for commit_siblings in sample confDaniel Gustafsson
The range for commit_siblings was incorrectly listed as starting on 1 instead of 0 in the sample configuration file. Backpatch down to all supported branches. Author: Man Zeng <zengman@halodbtech.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/tencent_53B70BA72303AE9C6889E78E@qq.com Backpatch-through: 14
2025-11-10Check for CREATE privilege on the schema in CREATE STATISTICS.Nathan Bossart
This omission allowed table owners to create statistics in any schema, potentially leading to unexpected naming conflicts. For ALTER TABLE commands that require re-creating statistics objects, skip this check in case the user has since lost CREATE on the schema. The addition of a second parameter to CreateStatistics() breaks ABI compatibility, but we are unaware of any impacted third-party code. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Security: CVE-2025-12817 Backpatch-through: 13
2025-11-10Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: d3bc33cce36158257311e5cfa36c97209f37dedc
2025-11-06Disallow generated columns in COPY WHERE clausePeter Eisentraut
Stored generated columns are not yet computed when the filtering happens, so we need to prohibit them to avoid incorrect behavior. Co-authored-by: jian he <jian.universality@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CACJufxHb8YPQ095R_pYDr77W9XKNaXg5Rzy-WP525mkq+hRM3g@mail.gmail.com
2025-11-06Update obsolete comment in ExecScanReScan().Etsuro Fujita
Commit 27cc7cd2b removed the epqScanDone flag from the EState struct, and instead added an equivalent flag named relsubs_done to the EPQState struct; but it failed to update this comment. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAPmGK152zJ3fU5avDT5udfL0namrDeVfMTL3dxdOXw28SOrycg%40mail.gmail.com Backpatch-through: 13
2025-11-05Avoid possible crash within libsanitizer.Tom Lane
We've successfully used libsanitizer for awhile with the undefined and alignment sanitizers, but with some other sanitizers (at least thread and hwaddress) it crashes due to internal recursion before it's fully initialized itself. It turns out that that's due to the "__ubsan_default_options" hack installed by commit f686ae82f, and we can fix it by ensuring that __ubsan_default_options is built without any sanitizer instrumentation hooks. Reported-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com> Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/F7543B04-E56C-4D68-A040-B14CCBAD38F1@gmail.com Discussion: https://postgr.es/m/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com Backpatch-through: 16
2025-11-04jit: Fix accidentally-harmless type confusionAndres Freund
In 2a0faed9d702, which added JIT compilation support for expressions, I accidentally used sizeof(LLVMBasicBlockRef *) instead of sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That turns out to have no real negative consequences due to LLVMBasicBlockRef being a pointer itself (and thus having the same size). It still is wrong and confusing, so fix it. Reported by coverity. Backpatch-through: 13
2025-11-04Fix snapshot handling bug in recent BRIN fixÁlvaro Herrera
Commit a95e3d84c0e0 added ActiveSnapshot push+pop when processing work-items (BRIN autosummarization), but forgot to handle the case of a transaction failing during the run, which drops the snapshot untimely. Fix by making the pop conditional on an element being actually there. Author: Álvaro Herrera <alvherre@kurilemu.de> Backpatch-through: 13 Discussion: https://postgr.es/m/202511041648.nofajnuddmwk@alvherre.pgsql
2025-11-04Backpatch: Fix warnings about declaration of environ on MinGWAndres Freund
Backpatch commit 7bc9a8bdd2d to 13-17. The motivation for backpatching is that we want to update CI to Debian Trixie. Trixie contains a newer mingw installation, which would trigger the warning addressed by 7bc9a8bdd2d. The risk of backpatching seems fairly low, given that it did not cause issues in the branches the commit is already present. While CI is not present in 13-14, it seems better to be consistent across branches. Author: Thomas Munro <tmunro@postgresql.org> Discussion: https://postgr.es/m/o5yadhhmyjo53svzwvaocww6zkrp63i4f32cw3treuh46pxtza@hyqio5b2tkt6 Backpatch-through: 13
2025-11-04Tighten check for generated column in partition key expressionPeter Eisentraut
A generated column may end up being part of the partition key expression, if it's specified as an expression e.g. "(<generated column name>)" or if the partition key expression contains a whole-row reference, even though we do not allow a generated column to be part of partition key expression. Fix this hole. Co-authored-by: jian he <jian.universality@gmail.com> Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://www.postgresql.org/message-id/flat/CACJufxF%3DWDGthXSAQr9thYUsfx_1_t9E6N8tE3B8EqXcVoVfQw%40mail.gmail.com
2025-11-04BRIN autosummarization may need a snapshotÁlvaro Herrera
It's possible to define BRIN indexes on functions that require a snapshot to run, but the autosummarization feature introduced by commit 7526e10224f0 fails to provide one. This causes autovacuum to leave a BRIN placeholder tuple behind after a failed work-item execution, making such indexes less efficient. Repair by obtaining a snapshot prior to running the task, and add a test to verify this behavior. Author: Álvaro Herrera <alvherre@kurilemu.de> Reported-by: Giovanni Fabris <giovanni.fabris@icon.it> Reported-by: Arthur Nascimento <tureba@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/202511031106.h4fwyuyui6fz@alvherre.pgsql
2025-11-04Fix unconditional WAL receiver shutdown during stream-archive transitionMichael Paquier
Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of 3635a0a35aaf) introduced an unconditional WAL receiver shutdown when switching from streaming to archive WAL sources. This causes problems during a timeline switch, when a WAL receiver enters WALRCV_WAITING state but remains alive, waiting for instructions. The unconditional shutdown can break some monitoring scenarios as the WAL receiver gets repeatedly terminated and re-spawned, causing pg_stat_wal_receiver.status to show a "streaming" instead of "waiting" status, masking the fact that the WAL receiver is waiting for a new TLI and a new LSN to be able to continue streaming. This commit changes the WAL receiver behavior so as the shutdown becomes conditional, with InstallXLogFileSegmentActive being always reset to prevent the regression fixed by b4f584f9d2a1: only terminate the WAL receiver when it is actively streaming (WALRCV_STREAMING, WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state, just reset InstallXLogFileSegmentActive flag to allow archive restoration without killing the process. WALRCV_STOPPED and WALRCV_STOPPING are not reachable states in this code path. For the latter, the startup process is the one in charge of setting WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to reach a WALRCV_STOPPED state after switching walRcvState, so WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is in a WALRCV_STOPPING state. A regression test is added to check that a WAL receiver is not stopped on timeline jump, that fails when the fix of this commit is reverted. Reported-by: Ryan Bird <ryanzxg@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org Backpatch-through: 13