Keep WAL segments by the flushed value of the slot's restart LSN

The patch fixes the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The issue occurs when a slot is advanced after the start of the checkpoint and before old WAL segments are removed at the end of the checkpoint. The idea of the patch is to get the minimal restart_lsn at the beginning of checkpoint (or restart point) creation and use this value when calculating the oldest LSN for WAL segments removal at the end of checkpoint. This idea was proposed by Tomas Vondra in the discussion. Unlike 291221c46575, this fix doesn't affect ABI and is intended for back branches. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13
author: Alexander Korotkov <akorotkov@postgresql.org> 2025-06-14 03:33:15 +0300
committer: Alexander Korotkov <akorotkov@postgresql.org> 2025-06-14 03:52:45 +0300
commit: 2090edc6f32f652a2c995ca5f7e65748ae1e4c5d (patch)
tree: d578f2083105280ea90a81ed45644fd7611f7545 /src/backend/replication/logical/logical.c
parent: 40aa5ddea1c02bcd098bf66d2a3e16faeec94aea (diff)
1 files changed, 9 insertions, 1 deletions
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 97b6aa899ee..4407df84a1c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1897,7 +1897,15 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 
 		SpinLockRelease(&MyReplicationSlot->mutex);
 
-		/* first write new xmin to disk, so we know what's up after a crash */
+		/*
+		 * First, write new xmin and restart_lsn to disk so we know what's up
+		 * after a crash.  Even when we do this, the checkpointer can see the
+		 * updated restart_lsn value in the shared memory; then, a crash can
+		 * happen before we manage to write that value to the disk.  Thus,
+		 * checkpointer still needs to make special efforts to keep WAL
+		 * segments required by the restart_lsn written to the disk.  See
+		 * CreateCheckPoint() and CreateRestartPoint() for details.
+		 */
 		if (updated_xmin || updated_restart)
 		{
 			ReplicationSlotMarkDirty();
author	Alexander Korotkov <akorotkov@postgresql.org>	2025-06-14 03:33:15 +0300
committer	Alexander Korotkov <akorotkov@postgresql.org>	2025-06-14 03:52:45 +0300
commit	2090edc6f32f652a2c995ca5f7e65748ae1e4c5d (patch)
tree	d578f2083105280ea90a81ed45644fd7611f7545 /src/backend/replication/logical/logical.c
parent	40aa5ddea1c02bcd098bf66d2a3e16faeec94aea (diff)