summaryrefslogtreecommitdiff
path: root/src/backend/access/spgist/README
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2012-03-11 16:29:04 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2012-03-11 16:29:59 -0400
commitc6a11b89e48dfb47b305cea405924333dabc20b6 (patch)
tree1ef16196fa824d0515789c59f34e46e829a43966 /src/backend/access/spgist/README
parentfc227a4e3b84f7bc243c4606780dde28aea257ee (diff)
Teach SPGiST to store nulls and do whole-index scans.
This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)
Diffstat (limited to 'src/backend/access/spgist/README')
-rw-r--r--src/backend/access/spgist/README32
1 files changed, 27 insertions, 5 deletions
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index 4ff0e357cb4..d20ad17a4b6 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -11,6 +11,7 @@ should have a high fanout to minimize I/O. The challenge is to map tree
nodes to disk pages in such a way that the search algorithm accesses only a
few disk pages, even if it traverses many nodes.
+
COMMON STRUCTURE DESCRIPTION
Logically, an SP-GiST tree is a set of tuples, each of which can be either
@@ -71,6 +72,21 @@ Leaf tuple consists of:
ItemPointer to the heap
+
+NULLS HANDLING
+
+We assume that SPGiST-indexable operators are strict (can never succeed for
+null inputs). It is still desirable to index nulls, so that whole-table
+indexscans are possible and so that "x IS NULL" can be implemented by an
+SPGiST indexscan. However, we prefer that SPGiST index opclasses not have
+to cope with nulls. Therefore, the main tree of an SPGiST index does not
+include any null entries. We store null entries in a separate SPGiST tree
+occupying a disjoint set of pages (in particular, its own root page).
+Insertions and searches in the nulls tree do not use any of the
+opclass-supplied functions, but just use hardwired logic comparable to
+AllTheSame cases in the normal tree.
+
+
INSERTION ALGORITHM
Insertion algorithm is designed to keep the tree in a consistent state at
@@ -181,6 +197,7 @@ described in (5).
and a new tuple to another page, if the list is short enough. This improves
space utilization, but doesn't change the basis of the algorithm.
+
CONCURRENCY
While descending the tree, the insertion algorithm holds exclusive lock on
@@ -218,6 +235,7 @@ scan that had already visited the parent level could possibly reach such a
redirect tuple, so we can remove redirects once all active transactions have
been flushed out of the system.
+
DEAD TUPLES
Tuples on leaf pages can be in one of four states:
@@ -269,6 +287,7 @@ to PLACEHOLDER status by VACUUM, and are then candidates for replacement.
DEAD state is not currently possible, since VACUUM does not attempt to
remove unused inner tuples.
+
VACUUM
VACUUM (or more precisely, spgbulkdelete) performs a single sequential scan
@@ -302,13 +321,16 @@ performed; otherwise, it does an spgbulkdelete scan with an empty target
list, so as to clean up redirections and placeholders, update the free
space map, and gather statistics.
+
LAST USED PAGE MANAGEMENT
-List of last used pages contains four pages - a leaf page and three inner
-pages, one from each "triple parity" group. This list is stored between
-calls on the index meta page, but updates are never WAL-logged to decrease
-WAL traffic. Incorrect data on meta page isn't critical, because we could
-allocate a new page at any moment.
+The list of last used pages contains four pages - a leaf page and three
+inner pages, one from each "triple parity" group. (Actually, there's one
+such list for the main tree and a separate one for the nulls tree.) This
+list is stored between calls on the index meta page, but updates are never
+WAL-logged to decrease WAL traffic. Incorrect data on meta page isn't
+critical, because we could allocate a new page at any moment.
+
AUTHORS