summaryrefslogtreecommitdiff
path: root/src/include/access/nbtree.h
diff options
context:
space:
mode:
authorPeter Geoghegan <pg@bowt.ie>2021-03-21 15:25:39 -0700
committerPeter Geoghegan <pg@bowt.ie>2021-03-21 15:25:39 -0700
commit9dd963ae2534e9614f0abeccaafbd39f1b93ff8a (patch)
tree0f3d448f2e5ad78b14eca30a2d90273676fbeaf0 /src/include/access/nbtree.h
parent4d399a6fbeb720b34d33441330910b7d853f703d (diff)
Recycle nbtree pages deleted during same VACUUM.
Maintain a simple array of metadata about pages that were deleted during nbtree VACUUM's current btvacuumscan() call. Use this metadata at the end of btvacuumscan() to attempt to place newly deleted pages in the FSM without further delay. It might not yet be safe to place any of the pages in the FSM by then (they may not be deemed recyclable), but we have little to lose and plenty to gain by trying. In practice there is a very good chance that this will work out when vacuuming larger indexes, where scanning the index naturally takes quite a while. This commit doesn't change the page recycling invariants; it merely improves the efficiency of page recycling within the confines of the existing design. Recycle safety is a part of nbtree's implementation of what Lanin & Shasha call "the drain technique". The design happens to use transaction IDs (they're stored in deleted pages), but that in itself doesn't align the cutoff for recycle safety to any of the XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters is whether or not _other_ backends might be able to observe various inconsistencies in the tree structure (that they cannot just detect and recover from by moving right). Recycle safety is purely a question of maintaining the consistency (or the apparent consistency) of a physical data structure. Note that running a simple serial test case involving a large range DELETE followed by a VACUUM VERBOSE will probably show that any newly deleted nbtree pages are not yet reusable/recyclable. This is expected in the absence of even one concurrent XID assignment. It is an old implementation restriction. In practice it's unlikely to be the thing that makes recycling remain unsafe, at least with larger indexes, where recycling newly deleted pages during the same VACUUM actually matters. An important high-level goal of this commit (as well as related recent commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup operations in index AMs rare in general. If index vacuuming frequently depends on the next VACUUM operation finishing off work that the current operation started, then the general behavior of index vacuuming is hard to predict. This is relevant to ongoing work that adds a vacuumlazy.c mechanism to skip index vacuuming in certain cases. Anything that makes the real world behavior of index vacuuming simpler and more linear will also make top-down modeling in vacuumlazy.c more robust. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
Diffstat (limited to 'src/include/access/nbtree.h')
-rw-r--r--src/include/access/nbtree.h28
1 files changed, 25 insertions, 3 deletions
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 88eccfcb732..a645c42e685 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -279,7 +279,8 @@ BTPageGetDeleteXid(Page page)
* Is an existing page recyclable?
*
* This exists to centralize the policy on which deleted pages are now safe to
- * re-use.
+ * re-use. However, _bt_pendingfsm_finalize() duplicates some of the same
+ * logic because it doesn't work directly with pages -- keep the two in sync.
*
* Note: PageIsNew() pages are always safe to recycle, but we can't deal with
* them here (caller is responsible for that case themselves). Caller might
@@ -305,6 +306,10 @@ BTPageIsRecyclable(Page page)
* For that check if the deletion XID could still be visible to
* anyone. If not, then no scan that's still in progress could have
* seen its downlink, and we can recycle it.
+ *
+ * XXX: If we had the heap relation we could be more aggressive about
+ * recycling deleted pages in non-catalog relations. For now we just
+ * pass NULL. That is at least simple and consistent.
*/
return GlobalVisCheckRemovableFullXid(NULL, BTPageGetDeleteXid(page));
}
@@ -313,9 +318,15 @@ BTPageIsRecyclable(Page page)
}
/*
- * BTVacState is private nbtree.c state used during VACUUM. It is exported
- * for use by page deletion related code in nbtpage.c.
+ * BTVacState and BTPendingFSM are private nbtree.c state used during VACUUM.
+ * They are exported for use by page deletion related code in nbtpage.c.
*/
+typedef struct BTPendingFSM
+{
+ BlockNumber target; /* Page deleted by current VACUUM */
+ FullTransactionId safexid; /* Page's BTDeletedPageData.safexid */
+} BTPendingFSM;
+
typedef struct BTVacState
{
IndexVacuumInfo *info;
@@ -324,6 +335,14 @@ typedef struct BTVacState
void *callback_state;
BTCycleId cycleid;
MemoryContext pagedelcontext;
+
+ /*
+ * _bt_pendingfsm_finalize() state
+ */
+ int bufsize; /* pendingpages space (in # elements) */
+ int maxbufsize; /* max bufsize that respects work_mem */
+ BTPendingFSM *pendingpages; /* One entry per newly deleted page */
+ int npendingpages; /* current # valid pendingpages */
} BTVacState;
/*
@@ -1195,6 +1214,9 @@ extern void _bt_delitems_delete_check(Relation rel, Buffer buf,
Relation heapRel,
TM_IndexDeleteOp *delstate);
extern void _bt_pagedel(Relation rel, Buffer leafbuf, BTVacState *vstate);
+extern void _bt_pendingfsm_init(Relation rel, BTVacState *vstate,
+ bool cleanuponly);
+extern void _bt_pendingfsm_finalize(Relation rel, BTVacState *vstate);
/*
* prototypes for functions in nbtsearch.c