summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorThomas Munro <tmunro@postgresql.org>2024-12-03 09:27:05 +1300
committerThomas Munro <tmunro@postgresql.org>2024-12-03 10:13:49 +1300
commit1168acbca475d48f745b345f69389df11f2396aa (patch)
tree23f447f0d09fa1919420ad2b067ddbaffd6ee9f7
parente359cbb846788c584767288497beaa788a0f1b3e (diff)
RelationTruncate() must set DELAY_CHKPT_START.
Previously, it set only DELAY_CHKPT_COMPLETE. That was important, because it meant that if the XLOG_SMGR_TRUNCATE record preceded a XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also happen on disk before the XLOG_CHECKPOINT_ONLINE record was written. However, it didn't guarantee that the sync request for the truncation was processed before the XLOG_CHECKPOINT_ONLINE record was written. By setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE record is written to WAL before the redo pointer of a concurrent checkpoint, the sync request queued by that operation must be processed by that checkpoint, rather than being left for the following one. This is a refinement of commit 412ad7a5563. Back-patch to all supported releases, like that commit. Author: Robert Haas <robertmhaas@gmail.com> Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com
-rw-r--r--src/backend/catalog/storage.c36
1 files changed, 26 insertions, 10 deletions
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index b823f508084..3afebc08995 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -326,20 +326,35 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
RelationPreTruncate(rel);
/*
- * Make sure that a concurrent checkpoint can't complete while truncation
- * is in progress.
+ * The code which follows can interact with concurrent checkpoints in two
+ * separate ways.
*
- * The truncation operation might drop buffers that the checkpoint
- * otherwise would have flushed. If it does, then it's essential that
- * the files actually get truncated on disk before the checkpoint record
- * is written. Otherwise, if reply begins from that checkpoint, the
+ * First, the truncation operation might drop buffers that the checkpoint
+ * otherwise would have flushed. If it does, then it's essential that the
+ * files actually get truncated on disk before the checkpoint record is
+ * written. Otherwise, if reply begins from that checkpoint, the
* to-be-truncated blocks might still exist on disk but have older
- * contents than expected, which can cause replay to fail. It's OK for
- * the blocks to not exist on disk at all, but not for them to have the
- * wrong contents.
+ * contents than expected, which can cause replay to fail. It's OK for the
+ * blocks to not exist on disk at all, but not for them to have the wrong
+ * contents. For this reason, we need to set DELAY_CHKPT_COMPLETE while
+ * this code executes.
+ *
+ * Second, the call to smgrtruncate() below will in turn call
+ * RegisterSyncRequest(). We need the sync request created by that call to
+ * be processed before the checkpoint completes. CheckPointGuts() will
+ * call ProcessSyncRequests(), but if we register our sync request after
+ * that happens, then the WAL record for the truncation could end up
+ * preceding the checkpoint record, while the actual sync doesn't happen
+ * until the next checkpoint. To prevent that, we need to set
+ * DELAY_CHKPT_START here. That way, if the XLOG_SMGR_TRUNCATE precedes
+ * the redo pointer of a concurrent checkpoint, we're guaranteed that the
+ * corresponding sync request will be processed before the checkpoint
+ * completes.
*/
+ Assert(!MyProc->delayChkpt);
+ MyProc->delayChkpt = true; /* DELAY_CHKPT_START */
Assert(!MyProc->delayChkptEnd);
- MyProc->delayChkptEnd = true;
+ MyProc->delayChkptEnd = true; /* DELAY_CHKPT_COMPLETE */
/*
* We WAL-log the truncation before actually truncating, which means
@@ -387,6 +402,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
smgrtruncate(RelationGetSmgr(rel), forks, nforks, blocks);
/* We've done all the critical work, so checkpoints are OK now. */
+ MyProc->delayChkpt = false;
MyProc->delayChkptEnd = false;
/*