diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2012-03-11 16:29:04 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2012-03-11 16:29:59 -0400 |
commit | c6a11b89e48dfb47b305cea405924333dabc20b6 (patch) | |
tree | 1ef16196fa824d0515789c59f34e46e829a43966 /src/backend/access/spgist/README | |
parent | fc227a4e3b84f7bc243c4606780dde28aea257ee (diff) |
Teach SPGiST to store nulls and do whole-index scans.
This patch fixes the other major compatibility-breaking limitation of
SPGiST, that it didn't store anything for null values of the indexed
column, and so could not support whole-index scans or "x IS NULL"
tests. The approach is to create a wholly separate search tree for
the null entries, and use fixed "allTheSame" insertion and search
rules when processing this tree, instead of calling the index opclass
methods. This way the opclass methods do not need to worry about
dealing with nulls.
Catversion bump is for pg_am updates as well as the change in on-disk
format of SPGiST indexes; there are some tweaks in SPGiST WAL records
as well.
Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev.
(The original also stored nulls separately, but it reused GIN code to do
so; which required undesirable compromises in the on-disk format, and
would likely lead to bugs due to the GIN code being required to work in
two very different contexts.)
Diffstat (limited to 'src/backend/access/spgist/README')
-rw-r--r-- | src/backend/access/spgist/README | 32 |
1 files changed, 27 insertions, 5 deletions
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README index 4ff0e357cb4..d20ad17a4b6 100644 --- a/src/backend/access/spgist/README +++ b/src/backend/access/spgist/README @@ -11,6 +11,7 @@ should have a high fanout to minimize I/O. The challenge is to map tree nodes to disk pages in such a way that the search algorithm accesses only a few disk pages, even if it traverses many nodes. + COMMON STRUCTURE DESCRIPTION Logically, an SP-GiST tree is a set of tuples, each of which can be either @@ -71,6 +72,21 @@ Leaf tuple consists of: ItemPointer to the heap + +NULLS HANDLING + +We assume that SPGiST-indexable operators are strict (can never succeed for +null inputs). It is still desirable to index nulls, so that whole-table +indexscans are possible and so that "x IS NULL" can be implemented by an +SPGiST indexscan. However, we prefer that SPGiST index opclasses not have +to cope with nulls. Therefore, the main tree of an SPGiST index does not +include any null entries. We store null entries in a separate SPGiST tree +occupying a disjoint set of pages (in particular, its own root page). +Insertions and searches in the nulls tree do not use any of the +opclass-supplied functions, but just use hardwired logic comparable to +AllTheSame cases in the normal tree. + + INSERTION ALGORITHM Insertion algorithm is designed to keep the tree in a consistent state at @@ -181,6 +197,7 @@ described in (5). and a new tuple to another page, if the list is short enough. This improves space utilization, but doesn't change the basis of the algorithm. + CONCURRENCY While descending the tree, the insertion algorithm holds exclusive lock on @@ -218,6 +235,7 @@ scan that had already visited the parent level could possibly reach such a redirect tuple, so we can remove redirects once all active transactions have been flushed out of the system. + DEAD TUPLES Tuples on leaf pages can be in one of four states: @@ -269,6 +287,7 @@ to PLACEHOLDER status by VACUUM, and are then candidates for replacement. DEAD state is not currently possible, since VACUUM does not attempt to remove unused inner tuples. + VACUUM VACUUM (or more precisely, spgbulkdelete) performs a single sequential scan @@ -302,13 +321,16 @@ performed; otherwise, it does an spgbulkdelete scan with an empty target list, so as to clean up redirections and placeholders, update the free space map, and gather statistics. + LAST USED PAGE MANAGEMENT -List of last used pages contains four pages - a leaf page and three inner -pages, one from each "triple parity" group. This list is stored between -calls on the index meta page, but updates are never WAL-logged to decrease -WAL traffic. Incorrect data on meta page isn't critical, because we could -allocate a new page at any moment. +The list of last used pages contains four pages - a leaf page and three +inner pages, one from each "triple parity" group. (Actually, there's one +such list for the main tree and a separate one for the nulls tree.) This +list is stored between calls on the index meta page, but updates are never +WAL-logged to decrease WAL traffic. Incorrect data on meta page isn't +critical, because we could allocate a new page at any moment. + AUTHORS |