summaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/storage.sgml42
-rw-r--r--doc/src/sgml/xtypes.sgml71
2 files changed, 111 insertions, 2 deletions
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index d8c52875d82..e5b7b4b68d0 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
<acronym>TOAST</> pointers can point to data that is not on disk, but is
elsewhere in the memory of the current server process. Such pointers
obviously cannot be long-lived, but they are nonetheless useful. There
-is currently just one sub-case:
-pointers to <firstterm>indirect</> data.
+are currently two sub-cases:
+pointers to <firstterm>indirect</> data and
+pointers to <firstterm>expanded</> data.
</para>
<para>
@@ -519,6 +520,43 @@ and there is no infrastructure to help with this.
</para>
<para>
+Expanded <acronym>TOAST</> pointers are useful for complex data types
+whose on-disk representation is not especially suited for computational
+purposes. As an example, the standard varlena representation of a
+<productname>PostgreSQL</> array includes dimensionality information, a
+nulls bitmap if there are any null elements, then the values of all the
+elements in order. When the element type itself is variable-length, the
+only way to find the <replaceable>N</>'th element is to scan through all the
+preceding elements. This representation is appropriate for on-disk storage
+because of its compactness, but for computations with the array it's much
+nicer to have an <quote>expanded</> or <quote>deconstructed</>
+representation in which all the element starting locations have been
+identified. The <acronym>TOAST</> pointer mechanism supports this need by
+allowing a pass-by-reference Datum to point to either a standard varlena
+value (the on-disk representation) or a <acronym>TOAST</> pointer that
+points to an expanded representation somewhere in memory. The details of
+this expanded representation are up to the data type, though it must have
+a standard header and meet the other API requirements given
+in <filename>src/include/utils/expandeddatum.h</>. C-level functions
+working with the data type can choose to handle either representation.
+Functions that do not know about the expanded representation, but simply
+apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically
+receive the traditional varlena representation; so support for an expanded
+representation can be introduced incrementally, one function at a time.
+</para>
+
+<para>
+<acronym>TOAST</> pointers to expanded values are further broken down
+into <firstterm>read-write</> and <firstterm>read-only</> pointers.
+The pointed-to representation is the same either way, but a function that
+receives a read-write pointer is allowed to modify the referenced value
+in-place, whereas one that receives a read-only pointer must not; it must
+first create a copy if it wants to make a modified version of the value.
+This distinction and some associated conventions make it possible to avoid
+unnecessary copying of expanded values during query execution.
+</para>
+
+<para>
For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
management code ensures that no such pointer datum can accidentally get
stored on disk. In-memory <acronym>TOAST</> pointers are automatically
diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml
index 2459616281d..ac0b8a2943f 100644
--- a/doc/src/sgml/xtypes.sgml
+++ b/doc/src/sgml/xtypes.sgml
@@ -300,6 +300,77 @@ CREATE TYPE complex (
</para>
</note>
+ <para>
+ Another feature that's enabled by <acronym>TOAST</> support is the
+ possibility of having an <firstterm>expanded</> in-memory data
+ representation that is more convenient to work with than the format that
+ is stored on disk. The regular or <quote>flat</> varlena storage format
+ is ultimately just a blob of bytes; it cannot for example contain
+ pointers, since it may get copied to other locations in memory.
+ For complex data types, the flat format may be quite expensive to work
+ with, so <productname>PostgreSQL</> provides a way to <quote>expand</>
+ the flat format into a representation that is more suited to computation,
+ and then pass that format in-memory between functions of the data type.
+ </para>
+
+ <para>
+ To use expanded storage, a data type must define an expanded format that
+ follows the rules given in <filename>src/include/utils/expandeddatum.h</>,
+ and provide functions to <quote>expand</> a flat varlena value into
+ expanded format and <quote>flatten</> the expanded format back to the
+ regular varlena representation. Then ensure that all C functions for
+ the data type can accept either representation, possibly by converting
+ one into the other immediately upon receipt. This does not require fixing
+ all existing functions for the data type at once, because the standard
+ <function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs
+ into regular flat format. Therefore, existing functions that work with
+ the flat varlena format will continue to work, though slightly
+ inefficiently, with expanded inputs; they need not be converted until and
+ unless better performance is important.
+ </para>
+
+ <para>
+ C functions that know how to work with an expanded representation
+ typically fall into two categories: those that can only handle expanded
+ format, and those that can handle either expanded or flat varlena inputs.
+ The former are easier to write but may be less efficient overall, because
+ converting a flat input to expanded form for use by a single function may
+ cost more than is saved by operating on the expanded format.
+ When only expanded format need be handled, conversion of flat inputs to
+ expanded form can be hidden inside an argument-fetching macro, so that
+ the function appears no more complex than one working with traditional
+ varlena input.
+ To handle both types of input, write an argument-fetching function that
+ will detoast external, short-header, and compressed varlena inputs, but
+ not expanded inputs. Such a function can be defined as returning a
+ pointer to a union of the flat varlena format and the expanded format.
+ Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to
+ determine which format they received.
+ </para>
+
+ <para>
+ The <acronym>TOAST</> infrastructure not only allows regular varlena
+ values to be distinguished from expanded values, but also
+ distinguishes <quote>read-write</> and <quote>read-only</> pointers to
+ expanded values. C functions that only need to examine an expanded
+ value, or will only change it in safe and non-semantically-visible ways,
+ need not care which type of pointer they receive. C functions that
+ produce a modified version of an input value are allowed to modify an
+ expanded input value in-place if they receive a read-write pointer, but
+ must not modify the input if they receive a read-only pointer; in that
+ case they have to copy the value first, producing a new value to modify.
+ A C function that has constructed a new expanded value should always
+ return a read-write pointer to it. Also, a C function that is modifying
+ a read-write expanded value in-place should take care to leave the value
+ in a sane state if it fails partway through.
+ </para>
+
+ <para>
+ For examples of working with expanded values, see the standard array
+ infrastructure, particularly
+ <filename>src/backend/utils/adt/array_expanded.c</>.
+ </para>
+
</sect2>
</sect1>