user/sven/postgresql.git

Age	Commit message (Collapse)	Author
8 days	Add the MODE option to the WAIT FOR LSN command	Alexander Korotkov
	This commit extends the WAIT FOR LSN command with an optional MODE option in the WITH clause that specifies which LSN type to wait for: WAIT FOR LSN '<lsn>' [WITH (MODE '<mode>', ...)] where mode can be: - 'standby_replay' (default): Wait for WAL to be replayed to the specified LSN, - 'standby_write': Wait for WAL to be written (received) to the specified LSN, - 'standby_flush': Wait for WAL to be flushed to disk at the specified LSN, - 'primary_flush': Wait for WAL to be flushed to disk on the primary server. The default mode is 'standby_replay', matching the original behavior when MODE is not specified. This follows the pattern used by COPY and EXPLAIN commands, where options are specified as string values in the WITH clause. Modes are explicitly named to distinguish between primary and standby operations: - Standby modes ('standby_replay', 'standby_write', 'standby_flush') can only be used during recovery (on a standby server), - Primary mode ('primary_flush') can only be used on a primary server. The 'standby_write' and 'standby_flush' modes are useful for scenarios where applications need to ensure WAL has been received or persisted on the standby without necessarily waiting for replay to complete. The 'primary_flush' mode allows waiting for WAL to be flushed on the primary server. This commit also includes includes: - Documentation updates for the new syntax and mode descriptions, - Test coverage for all four modes, including error cases and concurrent waiters, - Wakeup logic in walreceiver for standby write/flush waiters, - Wakeup logic in WAL writer for primary flush waiters. Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
9 days	Fix typos and inconsistencies in code and comments	Michael Paquier
	This change is a cocktail of harmonization of function argument names, grammar typos, renames for better consistency and unused code (see ltree). All of these have been spotted by the author. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
10 days	Improve a couple of error messages.	Tom Lane
	Change "function" to "function or procedure" in PreventInTransactionBlock, and improve grammar of ExecWaitStmt's complaint about having an active snapshot. Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAFj8pRCveWPR06bbad9GnMb0Kcr6jnXPttv9XOaOB+oFCD1Tsg@mail.gmail.com
12 days	Update copyright for 2026	Bruce Momjian
	Backpatch-through: 14
2025-12-23	Toggle logical decoding dynamically based on logical slot presence.	Masahiko Sawada
	Previously logical decoding required wal_level to be set to 'logical' at server start. This meant that users had to incur the overhead of logical-level WAL logging even when no logical replication slots were in use. This commit adds functionality to automatically control logical decoding availability based on logical replication slot presence. The newly introduced module logicalctl.c allows logical decoding to be dynamically activated when needed when wal_level is set to 'replica'. When the first logical replication slot is created, the system automatically increases the effective WAL level to maintain logical-level WAL records. Conversely, after the last logical slot is dropped or invalidated, it decreases back to 'replica' WAL level. While activation occurs synchronously right after creating the first logical slot, deactivation happens asynchronously through the checkpointer process. This design avoids a race condition at the end of recovery; a concurrent deactivation could happen while the startup process enables logical decoding at the end of recovery, but WAL writes are still not permitted until recovery fully completes. The checkpointer will handle it after recovery is done. Asynchronous deactivation also avoids excessive toggling of the logical decoding status in workloads that repeatedly create and drop a single logical slot. On the other hand, this lazy approach can delay changes to effective_wal_level and the disabling logical decoding, especially when the checkpointer is busy with other tasks. We chose this lazy approach in all deactivation paths to keep the implementation simple, even though laziness is strictly required only for end-of-recovery cases. Future work might address this limitation either by using a dedicated worker instead of the checkpointer, or by implementing synchronous waiting during slot drops if workloads are significantly affected by the lazy deactivation of logical decoding. The effective WAL level, determined internally by XLogLogicalInfo, is allowed to change within a transaction until an XID is assigned. Once an XID is assigned, the value becomes fixed for the remainder of the transaction. This behavior ensures that the logging mode remains consistent within a writing transaction, similar to the behavior of GUC parameters. A new read-only GUC parameter effective_wal_level is introduced to monitor the actual WAL level in effect. This parameter reflects the current operational WAL level, which may differ from the configured wal_level setting. Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct. Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
2025-12-18	Fix intermittent BF failure in 040_standby_failover_slots_sync.	Amit Kapila
	Commit 0d2d4a0ec3 introduced a test that verifies replication slot synchronization to a standby server via SQL API. However, the test did not configure synchronized_standby_slots. Without this setting, logical failover slots can advance beyond the physical replication slot, causing intermittent synchronization failures. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/TY4PR01MB16907DF70205308BE918E0D4494ABA@TY4PR01MB16907.jpnprd01.prod.outlook.com
2025-12-16	Add TAP test to check recovery when redo LSN is missing	Michael Paquier
	This commit provides test coverage for dc7c77f825d7, where the redo record and the checkpoint record finish on different WAL segments with the start of recovery able to detect that the redo record is missing. This test uses a wait injection point done in the critical section of a checkpoint, method that requires not one but actually two wait injection points to avoid any memory allocations within the critical section of the checkpoint: - Checkpoint run with a background psql. - One first wait point is run by the checkpointer before the critical section, allocating the shared memory required by the DSM registry for the wait machinery in the library injection_points. - First point is woken up. - Second wait point is loaded before the critical section, allocating the memory to build the path to the library loaded, then run in the critical section once the checkpoint redo record has been logged. - WAL segment is switched while waiting on the second point. - Checkpoint completes. - Stop cluster with immediate mode. - The segment that includes the redo record is removed. - Start, recovery fails as the redo record cannot be found. The error message introduced in dc7c77f825d7 is now reduced to a FATAL, meaning that the information is still provided while being able to use a test for it. Nitin has provided a basic version of the test, that I have enhanced to make it portable with two points. Without dc7c77f825d7, the cluster crashes in this test, not on a PANIC but due to the pointer dereference at the beginning of recovery, failure mentioned in the other commit. Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
2025-12-15	Add retry logic to pg_sync_replication_slots().	Amit Kapila
	Previously, pg_sync_replication_slots() would finish without synchronizing slots that didn't meet requirements, rather than failing outright. This could leave some failover slots unsynchronized if required catalog rows or WAL segments were missing or at risk of removal, while the standby continued removing needed data. To address this, the function now waits for the primary slot to advance to a position where all required data is available on the standby before completing synchronization. It retries cyclically until all failover slots that existed on the primary at the start of the call are synchronized. Slots created after the function begins are not included. If the standby is promoted during this wait, the function exits gracefully and the temporary slots will be removed. Author: Ajin Cherian <itsajin@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Yilin Zhang <jiezhilove@126.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
2025-12-06	Improve error reporting of recovery test 027_stream_regress	Michael Paquier
	Previously, the 027_stream_regress test reported the full contents of regression.diffs upon a test failure, when the standby and the primary were still alive. If a test fails quite badly, the amount of information reported can be really high, bloating the reports in the buildfarm, the CI, or even local runs. In most cases, we have noticed that having all this information is not necessary when attempting to identify the source of a problem in this test. This commit changes the situation by including the head and tail of regression.diffs in the reports generated on failure rather than its full contents, building upon b93f4e2f98b3 to optionally control the size of the reports with the new environment variable PG_TEST_FILE_READ_LINES. This will perhaps require some more tuning, but the hope is to reduce some of the buildfarm report bloat while making the information good enough to deduce what is happening when something is going wrong, be it in the buildfarm or some tests run in the CI, at least. Suggested-by: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
2025-12-03	Rename BUFFERPIN wait event class to BUFFER	Andres Freund
	In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-28	Add slotsync_skip_reason column to pg_replication_slots view.	Amit Kapila
	Introduce a new column, slotsync_skip_reason, in the pg_replication_slots view. This column records the reason why the last slot synchronization was skipped. It is primarily relevant for logical replication slots on standby servers where the 'synced' field is true. The value is NULL when synchronization succeeds. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-26	Fix test failure caused by commit 76b78721ca.	Amit Kapila
	The test failed because it assumed that a newly created logical replication slot could be synced to the standby by the slotsync worker. However, the presence of an existing physical slot caused the new logical slot to use a non-latest xmin. On the standby, the DDL had already been replayed, advancing xmin, which led to the slotsync worker failing to sync the lagging logical slot. To resolve this, we moved the slot sync statistics tests to run after the tests that do not require the newly created slot to be sync-ready. As per buildfarm. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB14966FE0BFB6C212298BFFEDEF5D1A@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-11-25	Add slotsync skip statistics.	Amit Kapila
	This patch adds two new columns to the pg_stat_replication_slots view: slotsync_skip_count - the total number of times a slotsync operation was skipped. slotsync_skip_at - the timestamp of the most recent skip. These additions provide better visibility into replication slot synchronization behavior. A future patch will introduce the slotsync_skip_reason column in pg_replication_slots to capture the reason for skip. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-05	Implement WAIT FOR command	Alexander Korotkov
	WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05	Fix timing-dependent failure in recovery test 004_timeline_switch	Michael Paquier
	The test introduced by 17b2d5ec759c verifies that a WAL receiver survives across a timeline jump by searching the server logs for termination messages. However, it called restart() before the timeline switch, which kills the WAL receiver and may log the exact message being checked, hence failing the test. As TAP tests reuse the same log file across restarts, a rotate_logfile() is used before the restart so as the log matching check is not impacted by log entries generated by a previous shutdown. Recent changes to file handle inheritance altered I/O timing enough to make this fail consistently while testing another patch. While on it, this adds an extra check based on a PID comparison. This test may lead to false positives as it could be possible that the WAL receiver has processed a timeline jump before the initial PID is grabbed, but it should be good enough in most cases. Like 17b2d5ec759c, backpatch down to v13. Author: Bryan Green <dbryan.green@gmail.com> Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/9d00b597-d64a-4f1e-802e-90f9dc394c70@gmail.com Backpatch-through: 13
2025-11-04	Fix unconditional WAL receiver shutdown during stream-archive transition	Michael Paquier
	Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of 3635a0a35aaf) introduced an unconditional WAL receiver shutdown when switching from streaming to archive WAL sources. This causes problems during a timeline switch, when a WAL receiver enters WALRCV_WAITING state but remains alive, waiting for instructions. The unconditional shutdown can break some monitoring scenarios as the WAL receiver gets repeatedly terminated and re-spawned, causing pg_stat_wal_receiver.status to show a "streaming" instead of "waiting" status, masking the fact that the WAL receiver is waiting for a new TLI and a new LSN to be able to continue streaming. This commit changes the WAL receiver behavior so as the shutdown becomes conditional, with InstallXLogFileSegmentActive being always reset to prevent the regression fixed by b4f584f9d2a1: only terminate the WAL receiver when it is actively streaming (WALRCV_STREAMING, WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state, just reset InstallXLogFileSegmentActive flag to allow archive restoration without killing the process. WALRCV_STOPPED and WALRCV_STOPPING are not reachable states in this code path. For the latter, the startup process is the one in charge of setting WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to reach a WALRCV_STOPPED state after switching walRcvState, so WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is in a WALRCV_STOPPING state. A regression test is added to check that a WAL receiver is not stopped on timeline jump, that fails when the fix of this commit is reverted. Reported-by: Ryan Bird <ryanzxg@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org Backpatch-through: 13
2025-11-01	Remove unused variable in recovery/t/006_logical_decoding.pl.	Tom Lane
	Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAJDiXggmZWew8+SY_9o0atpmaJmPTL25wdz07MrDoqCkp4D1ug@mail.gmail.com
2025-10-31	Add test tracking WAL receiver shutdown for primary_conninfo updates	Michael Paquier
	The test introduced by this commit checks that a reload of primary_conninfo leads to a WAL receiver restarted, by looking at the request generated in the server logs. This is something for what there was no coverage. This has come up for a different patch, while discussing a regression where a WAL receiver should not be stopped while waiting for a new position to stream, like at the end of a timeline. In the case of the other patch, we want to check that this log entry is not generated, but if the error message is reworded the test would become silently broken. The test of this commit ensures that we at least keep track the log message format, for a supported scenario. Extracted from a larger patch by the same author. Author: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/aQKlC1v2_MXGV6_9@paquier.xyz
2025-10-22	Add copyright notice to vacuum_horizon_floor.pl test.	Masahiko Sawada
	Fix oversight in commit 303ba0573, which was backpatched through 14. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAD21AoBeFdTJcwUfUYPcEgONab3TS6i1PB9S5cSXcBAmdAdQKw%40mail.gmail.com Backpatch-through: 14
2025-10-17	Improve TAP tests by replacing ok() with better Test::More functions	Michael Paquier
	The TAP tests whose ok() calls are changed in this commit were relying on perl operators, rather than equivalents available in Test::More. For example, rather than the following: ok($data =~ qr/expr/m, "expr matching"); ok($data !~ qr/expr/m, "expr not matching"); The new test code uses this equivalent: like($data, qr/expr/m, "expr matching"); unlike($data, qr/expr/m, "expr not matching"); A huge benefit of the new formulation is that it is possible to know about the values we are checking if a failure happens, making debugging easier, should the test runs happen in the buildfarm, in the CI or locally. This change leads to more test code overall as perltidy likes to make the code pretty the way it is in this commit. Author: Sadhuprasad Patro <b.sadhu@gmail.com> Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
2025-10-17	Fix matching check in recovery test 042_low_level_backup	Michael Paquier
	042_low_level_backup compared the result of a query two times with a comparison operator based on an integer, while the result should be compared with a string. The outcome of the tests is currently not impacted by this change. However, it could be possible that the tests fail to detect future issues if the query results become different, for some reason. Oversight in 99b4a63bef94. Author: Sadhuprasad Patro <b.sadhu@gmail.com> Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com Backpatch-through: 17
2025-09-25	Fix comments in recovery tests	Daniel Gustafsson
	Commit 4464fddf removed the large insertions but missed to remove all the comments referring to them. Also remove a superfluous ')' in another comment. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/OSCPR01MB149663A99DAF2826BE691C23DF51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-08-21	Apply some fat commas to commands of TAP tests	Michael Paquier
	This is similar to 19c6e92b13b2, in order to keep the style used in the scripts consistent for the option names and values used in commands. The places updated in this commit have been added recently in 71ea0d679543. These changes are cosmetic; there is no need for a backpatch.
2025-08-11	Restrict psql meta-commands in plain-text dumps.	Nathan Bossart
	A malicious server could inject psql meta-commands into plain-text dump output (i.e., scripts created with pg_dump --format=plain, pg_dumpall, or pg_restore --file) that are run at restore time on the machine running psql. To fix, introduce a new "restricted" mode in psql that blocks all meta-commands (except for \unrestrict to exit the mode), and teach pg_dump, pg_dumpall, and pg_restore to use this mode in plain-text dumps. While at it, encourage users to only restore dumps generated from trusted servers or to inspect it beforehand, since restoring causes the destination to execute arbitrary code of the source superusers' choice. However, the client running the dump and restore needn't trust the source or destination superusers. Reported-by: Martin Rakhmanov Reported-by: Matthieu Denais <litezeraw@gmail.com> Reported-by: RyotaK <ryotak.mail@gmail.com> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Security: CVE-2025-8714 Backpatch-through: 13
2025-08-04	Minor test fixes in 035_standby_logical_decoding.pl	Melanie Plageman
	Import usleep, which, due to an oversight in oversight in commit 48796a98d5ae was used but not imported. Correct the comparison string used in two logfile checks. Previously, it was incorrect and thus the test could never have failed. Also wordsmith a comment to make it clear when hot_standby_feedback is meant to be on during the test scenarios. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com Backpatch-through: 16
2025-07-29	Add regression test for background worker restart after crash.	Fujii Masao
	Previously, if a background worker crashed and the server restarted with restart_after_crash enabled, the worker was not restarted as expected. This issue was fixed by commit b5d084c5353, which ensures that background workers without the never-restart flag are correctly restarted after a crash-and-restart cycle. To guard against regressions, this commit adds a test that verifies a background worker successfully restarts in such a scenario. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: ChangAo Chen <cca5507@qq.com> Discussion: https://postgr.es/m/CAHGQGwHF-PdUOgiXCH_8K5qBm8b13h0Qt=dSoFXZybXQdbf-tw@mail.gmail.com
2025-07-19	Reintroduce test 046_checkpoint_logical_slot	Alexander Korotkov
	This commit is only for HEAD and v18, where the test has been removed. It also incorporates improvements below to stability and coverage of the original test, which were already backpatched to v17. - Add one pg_logical_emit_message() call to force the creation of a record that spawns across two pages. - Make the logic wait for the checkpoint completion. Author: Alexander Korotkov <akorotkov@postgresql.org> Co-authored-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Backpatch-through: 18
2025-07-19	Improve the stability of the recovery test 047_checkpoint_physical_slot	Alexander Korotkov
	Currently, the comments in 047_checkpoint_physical_slot. It shows an incomplete intention to wait for checkpoint completion before performing an immediate database stop. However, an immediate node stop can occur both before and after checkpoint completion. Both cases should work correctly. But we would like the test to be more stable and deterministic. This is why this commit makes this test explicitly wait for the checkpoint completion log message. Discussion: https://postgr.es/m/CAPpHfdurV-j_e0pb%3DUFENAy3tyzxfF%2ByHveNDNQk2gM82WBU5A%40mail.gmail.com Discussion: https://postgr.es/m/aHXLep3OaX_vRTNQ%40paquier.xyz Author: Alexander Korotkov <akorotkov@postgresql.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Backpatch-through: 17
2025-07-19	Check status of nodes after regression test run in 027_stream_regress	Michael Paquier
	This commit improves the recovery TAP test 027_stream_regress so as regression diffs are printed only if both the primary and the standby are still alive after the main regression test suite finishes, relying on d4c9195eff41 to do the job. Particularly, a crash of the primary could scribble the contents reported with mostly useless data, as the diffs would refer to query that failed to run, not necessarily the cause of the crash. Suggested-by: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ1D6KXvjSs7YGsDeadqCxNF3UUhjRAfforzzP0k-cE=bA@mail.gmail.com
2025-07-11	Rename CHECKPOINT_IMMEDIATE to CHECKPOINT_FAST.	Nathan Bossart
	The new name more accurately reflects the effects of this flag on a requested checkpoint. Checkpoint-related log messages (i.e., those controlled by the log_checkpoints configuration parameter) will now say "fast" instead of "immediate", too. Likewise, references to "immediate" checkpoints in the documentation have been updated to say "fast". This is preparatory work for a follow-up commit that will add a MODE option to the CHECKPOINT command. Author: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
2025-07-07	Standardize LSN formatting by zero padding	Álvaro Herrera
	This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-07-03	Improve checks for GUC recovery_target_timeline	Michael Paquier
	Currently check_recovery_target_timeline() converts any value that is not "current", "latest", or a valid integer to 0. So, for example, the following configuration added to postgresql.conf followed by a startup: recovery_target_timeline = 'bogus' recovery_target_timeline = '9999999999' ... results in the following error patterns: FATAL: 22023: recovery target timeline 0 does not exist FATAL: 22023: recovery target timeline 1410065407 does not exist This is confusing, because the server does not reflect the intention of the user, and just reports incorrect data unrelated to the GUC. The origin of the problem is that we do not perform a range check in the GUC value passed-in for recovery_target_timeline. This commit improves the situation by using strtou64() and by providing stricter range checks. Some test cases are added for the cases of an incorrect, an upper-bound and a lower-bound timeline value, checking the sanity of the reports based on the contents of the server logs. Author: David Steele <david@pgmasters.net> Discussion: https://postgr.es/m/e5d472c7-e9be-4710-8dc4-ebe721b62cea@pgbackrest.org
2025-06-29	Run pgperltidy	Joe Conway
	This is required before the creation of a new branch. pgindent is clean, as well as is reformat-dat-files. perltidy version is v20230309, as documented in pgindent's README.
2025-06-24	Test that vacuum removes tuples older than OldestXmin	Melanie Plageman
	If vacuum fails to prune a tuple killed before OldestXmin, it will decide to freeze its xmax and later error out in pre-freeze checks. Add a test reproducing this scenario to the recovery suite which creates a table on a primary, updates the table to generate dead tuples for vacuum, and then, during the vacuum, uses a replica to force GlobalVisState->maybe_needed on the primary to move backwards and precede the value of OldestXmin set at the beginning of vacuuming the table. This test is coverage for a case fixed in 83c39a1f7f3. The test was originally committed to master in aa607980aee but later reverted in efcbb76efe4 due to test instability. The test requires multiple index passes. In Postgres 17+, vacuum uses a TID store for the dead TIDs that is very space efficient. With the old minimum maintenance_work_mem of 1 MB, it required a large number of dead rows to generate enough dead TIDs to force multiple index vacuuming passes. Once the source code changes were made to allow a minimum maintenance_work_mem value of 64kB, the test could be made much faster and more stable. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAAKRu_ZJBkidusDut6i%3DbDCiXzJEp93GC1%2BNFaZt4eqanYF3Kw%40mail.gmail.com Backpatch-through: 17
2025-06-23	Temporarily remove 046_checkpoint_logical_slot.pl	Alexander Korotkov
	This new test was intended to check the handling of the replication slot's restart lsn fixed in ca307d5cec90. However, it also reveals another issue related to logical decoding. This commit temporarily removes this test to keep the buildfarm and CFbot green and avoid distorting others' work. This test will be restored once we investigate and fix the issue. Discussion: https://postgr.es/m/CAAKRu_ZCOzQpEumLFgG_%2Biw3FTa%2BhJ4SRpxzaQBYxxM_ZAzWcA%40mail.gmail.com
2025-06-20	Improve runtime and output of tests for replication slots checkpointing.	Alexander Korotkov
	The TAP tests that verify logical and physical replication slot behavior during checkpoints (046_checkpoint_logical_slot.pl and 047_checkpoint_physical_slot.pl) inserted two batches of 2 million rows each, generating approximately 520 MB of WAL. On slow machines, or when compiled with '-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE', this caused the tests to run for 8-9 minutes and occasionally time out, as seen on the buildfarm animal prion. This commit modifies the mentioned tests to utilize the $node->advance_wal() function, thereby reducing runtime. Once we do not use the generated data, the proposed function is a good alternative, which cuts the total wall-clock run time. While here, remove superfluous '\n' characters from several note() calls; these appeared literally in the build-farm logs and looked odd. Also, remove excessive 'shared_preload_libraries' GUC from the config and add a check for 'injection_points' extension availability. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/fbc5d94e-6fbd-4a64-85d4-c9e284a58eb2%40gmail.com Backpatch-through: 17
2025-06-14	Add TAP tests to check replication slot advance during the checkpoint	Alexander Korotkov
	The new tests verify that logical and physical replication slots are still valid after an immediate restart on checkpoint completion when the slot was advanced during the checkpoint. This commit introduces two new injection points to make these tests possible: * checkpoint-before-old-wal-removal - triggered in the checkpointer process just before old WAL segments cleanup; * logical-replication-slot-advance-segment - triggered in LogicalConfirmReceivedLocation() when restart_lsn was changed enough to point to the next WAL segment. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17
2025-05-22	Replace deprecated log_connections values in docs and tests	Melanie Plageman
	9219093cab2607f modularized log_connections output to allow more granular control over which aspects of connection establishment are logged. It converted the boolean log_connections GUC into a list of strings and deprecated previously supported boolean-like values on, off, true, false, 1, 0, yes, and no. Those values still work, but they are supported mainly for backwards compatability. As such, documented examples of log_connections should not use these deprecated values. Update references in the docs to deprecated log_connections values. Many of the tests use log_connections. This commit also updates the tests to use the new values of log_connections. In some of the tests, the updated log_connections value covers a narrower set of aspects (e.g. the 'authentication' aspect in the tests in src/test/authentication and the 'receipt' aspect in src/test/postmaster). In other cases, the new value for log_connections is a superset of the previous included aspects (e.g. 'all' in src/test/kerberos/t/001_auth.pl). Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/e1586594-3b69-4aea-87ce-73a7488cdc97%40eisentraut.org
2025-04-20	Test restartpoints in archive recovery.	Noah Misch
	v14 commit 1f95181b44c843729caaa688f74babe9403b5850 and its v13 equivalent caused timing-dependent failures in archive recovery, at restartpoints. The symptom was "invalid magic number 0000 in log segment X, offset 0", "unexpected pageaddr X in log segment Y, offset 0" [X < Y], or an assertion failure. Commit 3635a0a35aafd3bfa80b7a809bc6e91ccd36606a and predecessors back-patched v15 changes to fix that. This test reproduces the problem probabilistically, typically in less than 1000 iterations of the test. Hence, buildfarm and CI runs would have surfaced enough failures to get attention within a day. Reported-by: Arun Thirupathi <arunth@google.com> Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
2025-04-19	Fix typos and grammar in the code	Michael Paquier
	The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-08	Stabilize 035_standby_logical_decoding.pl.	Amit Kapila
	Some tests try to invalidate logical slots on the standby server by running VACUUM on the primary. The problem is that xl_running_xacts was getting generated and replayed before the VACUUM command, leading to the advancement of the active slot's catalog_xmin. Due to this, active slots were not getting invalidated, leading to test failures. We fix it by skipping the generation of xl_running_xacts for the required tests with the help of injection points. As the required interface for injection points was not present in back branches, we fixed the failing tests in them by disallowing the slot to become active for the required cases (where rows_removed conflict could be generated). Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/Z6oQXc8LmiTLfwLA@ip-10-97-1-34.eu-west-3.compute.internal
2025-04-08	Flush the IO statistics of active WAL senders more frequently	Michael Paquier
	WAL senders do not flush their statistics until they exit, limiting the monitoring possible for live processes. This is penalizing when WAL senders are running for a long time, like in streaming or logical replication setups, because it is not possible to know the amount of IO they generate while running. This commit makes WAL senders more aggressive with their statistics flush, using an internal of 1 second, with the flush timing calculated based on the existing GetCurrentTimestamp() done before the sleeps done to wait for some activity. Note that the sleep done for logical and physical WAL senders happens in two different code paths, so the stats flushes need to happen in these two places. One test is added for the physical WAL sender case, and one for the logical WAL sender case. This can be done in a stable fashion by relying on the WAL generated by the TAP tests in combination with a stats reset while a server is running, but only on HEAD as WAL data has been added to pg_stat_io in a051e71e28a1. This issue exists since a9c70b46dbe and the introduction of pg_stat_io, so backpatch down to v16. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z73IsKBceoVd4t55@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 16
2025-04-04	Fix logical decoding test to correctly check slot removal on standby.	Fujii Masao
	The regression test for logical decoding verifies whether a logical slot is correctly dropped on a standby when its associated database is dropped. However, the test mistakenly retrieved slot information from the primary instead of the standby, causing incorrect behavior. This commit fixes the issue by ensuring the test correctly checks the slot on the standby. Back-patch to all supported versions. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/1fdfd020-a509-403c-bd8f-a04664aba148@oss.nttdata.com Backpatch-through: 13
2025-04-04	Fix logical decoding regression tests to correctly check slot existence.	Fujii Masao
	The regression tests for logical decoding verify whether a logical slot exists or has been dropped. Previously, these tests attempted to retrieve "slot_name" from the result of slot(), but since "slot_name" was not included in the result, slot()->{'slot_name'} always returned undef, leading to incorrect behavior. This commit fixes the issue by checking the "plugin" field in the result of slot() instead, ensuring the tests properly verify slot existence. Back-patch to all supported versions. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB149667EC4E738769CA80B7EA5F5AE2@OSCPR01MB14966.jpnprd01.prod.outlook.com Backpatch-through: 13
2025-04-03	Restrict copying of invalidated replication slots.	Masahiko Sawada
	Previously, invalidated logical and physical replication slots could be copied using the pg_copy_logical_replication_slot and pg_copy_physical_replication_slot functions. Replication slots that were invalidated for reasons other than WAL removal retained their restart_lsn. This meant that a new slot copied from an invalidated slot could have a restart_lsn pointing to a WAL segment that might have already been removed. This commit restricts the copying of invalidated replication slots. Backpatch to v16, where slots could retain their restart_lsn when invalidated for reasons other than WAL removal. For v15 and earlier, this check is not required since slots can only be invalidated due to WAL removal, and existing checks already handle this issue. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com Backpatch-through: 16
2025-04-03	Fix slot synchronization for two_phase enabled slots.	Amit Kapila
	The issue is that the transactions prepared before two-phase decoding is enabled can fail to replicate to the subscriber after being committed on a promoted standby following a failover. This is because the two_phase_at field of a slot, which tracks the LSN from which two-phase decoding starts, is not synchronized to standby servers. Without two_phase_at, the logical decoding might incorrectly identify prepared transaction as already replicated to the subscriber after promotion of standby server, causing them to be skipped. To address the issue on HEAD, the two_phase_at field of the slot is exposed by the pg_replication_slots view and allows the slot synchronization to copy this value to the corresponding synced slot on the standby server. This bug is likely to occur if the user toggles the two_phase option to true after initial slot creation. Given that altering the two_phase option of a replication slot is not allowed in PostgreSQL 17, this bug is less likely to occur. We can't change the view/function definition in backbranch so we can't push the same fix but we are brainstorming an appropriate solution for PG17. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-03-17	Apply more consistent style for command options in TAP tests	Michael Paquier
	This commit reshapes the grammar of some commands to apply a more consistent style across the board, following rules similar to ce1b0f9da03e: - Elimination of some pointless used-once variables. - Use of long options, to self-document better the options used. - Use of fat commas to link option names and their assigned values, including redirections, so as perltidy can be tricked to put them together. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87jz8rzf3h.fsf@wibble.ilmari.org
2025-03-05	Fix some gaps in pg_stat_io with WAL receiver and WAL summarizer	Michael Paquier
	The WAL receiver and WAL summarizer processes gain each one a call to pgstat_report_wal(), to make sure that they report their WAL statistics to pgstats, gathering data for pg_stat_io. In the WAL receiver, the stats reports are timed with status updates sent to the primary, that depend on wal_receiver_status_interval and wal_receiver_timeout. This is a conservative choice, but perhaps we could be more aggressive with the frequency of the stats reports. An interesting historical fact is that the WAL receiver does writes and syncs of WAL, but it has never reported its statistics to pgstats in pg_stat_wal. In the WAL summarizer, the stats reports are done each time the process waits for WAL. While on it, pg_stat_io is adjusted so as these two processes do not report any rows when IOObject is not WAL, making the view easier to use with less rows. Two tests are added in TAP, checking statistics for the WAL summarizer and the WAL receiver. Status updates in the WAL receiver are currently possible in the recovery test 001_stream_rep.pl. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz
2025-02-24	Fix assertion when decoding XLOG_PARAMETER_CHANGE on promoted primary.	Masahiko Sawada
	When a standby replays an XLOG_PARAMETER_CHANGE record that lowers wal_level below logical, we invalidate all logical slots in hot standby mode. However, if this record was replayed while not in hot standby mode, logical slots could remain valid even after promotion, potentially causing an assertion failure during WAL record decoding. To fix this issue, this commit adds a check for hot_standby status when restoring a logical replication slot on standbys. This check ensures that logical slots are invalidated when they become incompatible due to insufficient wal_level during recovery. Backpatch to v16 where logical decoding on standby was introduced. Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAD21AoABoFwGY_Rh2aeE6tEq3HkJxf0c6UeOXn4VV9v6BAQPSw%40mail.gmail.com Backpatch-through: 16
2025-02-20	Transfer statistics during pg_upgrade.	Jeff Davis
	Add support to pg_dump for dumping stats, and use that during pg_upgrade so that statistics are transferred during upgrade. In most cases this removes the need for a costly re-analyze after upgrade. Some statistics are not transferred, such as extended statistics or statistics with a custom stakind. Now pg_dump accepts the options --schema-only, --no-schema, --data-only, --no-data, --statistics-only, and --no-statistics; which allow all combinations of schema, data, and/or stats. The options are named this way to preserve compatibility with the previous --schema-only and --data-only options. Statistics are in SECTION_DATA, unless the object itself is in SECTION_POST_DATA. The stats are represented as calls to pg_restore_relation_stats() and pg_restore_attribute_stats(). Author: Corey Huinker, Jeff Davis Reviewed-by: Jian He Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com