summaryrefslogtreecommitdiff
path: root/doc/src/sgml/gist.sgml
diff options
context:
space:
mode:
authorHeikki Linnakangas <heikki.linnakangas@iki.fi>2011-09-08 17:51:23 +0300
committerHeikki Linnakangas <heikki.linnakangas@iki.fi>2011-09-08 17:51:23 +0300
commit5edb24a8983e4a103e26153853d91141f818227c (patch)
tree9e3102de6e2149b0d3678b403c91955e97f3bdc8 /doc/src/sgml/gist.sgml
parent09b68c70af855a0a69cede14da70968ddd97ba05 (diff)
Buffering GiST index build algorithm.
When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov
Diffstat (limited to 'doc/src/sgml/gist.sgml')
-rw-r--r--doc/src/sgml/gist.sgml34
1 files changed, 34 insertions, 0 deletions
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 78171cfa3fd..1b6fa1a8817 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -642,6 +642,40 @@ my_distance(PG_FUNCTION_ARGS)
</variablelist>
+ <sect2 id="gist-buffering-build">
+ <title>GiST buffering build</title>
+ <para>
+ Building large GiST indexes by simply inserting all the tuples tends to be
+ slow, because if the index tuples are scattered across the index and the
+ index is large enough to not fit in cache, the insertions need to perform
+ a lot of random I/O. PostgreSQL from version 9.2 supports a more efficient
+ method to build GiST indexes based on buffering, which can dramatically
+ reduce number of random I/O needed for non-ordered data sets. For
+ well-ordered datasets the benefit is smaller or non-existent, because
+ only a small number of pages receive new tuples at a time, and those pages
+ fit in cache even if the index as whole does not.
+ </para>
+
+ <para>
+ However, buffering index build needs to call the <function>penalty</>
+ function more often, which consumes some extra CPU resources. Also, the
+ buffers used in the buffering build need temporary disk space, up to
+ the size of the resulting index. Buffering can also infuence the quality
+ of the produced index, in both positive and negative directions. That
+ influence depends on various factors, like the distribution of the input
+ data and operator class implementation.
+ </para>
+
+ <para>
+ By default, the index build switches to the buffering method when the
+ index size reaches <xref linkend="guc-effective-cache-size">. It can
+ be manually turned on or off by the <literal>BUFFERING</literal> parameter
+ to the CREATE INDEX clause. The default behavior is good for most cases,
+ but turning buffering off might speed up the build somewhat if the input
+ data is ordered.
+ </para>
+
+ </sect2>
</sect1>
<sect1 id="gist-examples">