diff options
author | Peter Geoghegan <pg@bowt.ie> | 2021-03-21 15:25:39 -0700 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2021-03-21 15:25:39 -0700 |
commit | 9dd963ae2534e9614f0abeccaafbd39f1b93ff8a (patch) | |
tree | 0f3d448f2e5ad78b14eca30a2d90273676fbeaf0 /src/include/access/nbtree.h | |
parent | 4d399a6fbeb720b34d33441330910b7d853f703d (diff) |
Recycle nbtree pages deleted during same VACUUM.
Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call. Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay. It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying. In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.
This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design. Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique". The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right). Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.
Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable. This is expected
in the absence of even one concurrent XID assignment. It is an old
implementation restriction. In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.
An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general. If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict. This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases. Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
Diffstat (limited to 'src/include/access/nbtree.h')
-rw-r--r-- | src/include/access/nbtree.h | 28 |
1 files changed, 25 insertions, 3 deletions
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index 88eccfcb732..a645c42e685 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -279,7 +279,8 @@ BTPageGetDeleteXid(Page page) * Is an existing page recyclable? * * This exists to centralize the policy on which deleted pages are now safe to - * re-use. + * re-use. However, _bt_pendingfsm_finalize() duplicates some of the same + * logic because it doesn't work directly with pages -- keep the two in sync. * * Note: PageIsNew() pages are always safe to recycle, but we can't deal with * them here (caller is responsible for that case themselves). Caller might @@ -305,6 +306,10 @@ BTPageIsRecyclable(Page page) * For that check if the deletion XID could still be visible to * anyone. If not, then no scan that's still in progress could have * seen its downlink, and we can recycle it. + * + * XXX: If we had the heap relation we could be more aggressive about + * recycling deleted pages in non-catalog relations. For now we just + * pass NULL. That is at least simple and consistent. */ return GlobalVisCheckRemovableFullXid(NULL, BTPageGetDeleteXid(page)); } @@ -313,9 +318,15 @@ BTPageIsRecyclable(Page page) } /* - * BTVacState is private nbtree.c state used during VACUUM. It is exported - * for use by page deletion related code in nbtpage.c. + * BTVacState and BTPendingFSM are private nbtree.c state used during VACUUM. + * They are exported for use by page deletion related code in nbtpage.c. */ +typedef struct BTPendingFSM +{ + BlockNumber target; /* Page deleted by current VACUUM */ + FullTransactionId safexid; /* Page's BTDeletedPageData.safexid */ +} BTPendingFSM; + typedef struct BTVacState { IndexVacuumInfo *info; @@ -324,6 +335,14 @@ typedef struct BTVacState void *callback_state; BTCycleId cycleid; MemoryContext pagedelcontext; + + /* + * _bt_pendingfsm_finalize() state + */ + int bufsize; /* pendingpages space (in # elements) */ + int maxbufsize; /* max bufsize that respects work_mem */ + BTPendingFSM *pendingpages; /* One entry per newly deleted page */ + int npendingpages; /* current # valid pendingpages */ } BTVacState; /* @@ -1195,6 +1214,9 @@ extern void _bt_delitems_delete_check(Relation rel, Buffer buf, Relation heapRel, TM_IndexDeleteOp *delstate); extern void _bt_pagedel(Relation rel, Buffer leafbuf, BTVacState *vstate); +extern void _bt_pendingfsm_init(Relation rel, BTVacState *vstate, + bool cleanuponly); +extern void _bt_pendingfsm_finalize(Relation rel, BTVacState *vstate); /* * prototypes for functions in nbtsearch.c |