2 files changed, 111 insertions, 2 deletions
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index d8c52875d82..e5b7b4b68d0 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
 <acronym>TOAST</> pointers can point to data that is not on disk, but is
 elsewhere in the memory of the current server process.  Such pointers
 obviously cannot be long-lived, but they are nonetheless useful.  There
-is currently just one sub-case:
-pointers to <firstterm>indirect</> data.
+are currently two sub-cases:
+pointers to <firstterm>indirect</> data and
+pointers to <firstterm>expanded</> data.
 </para>
 
 <para>
@@ -519,6 +520,43 @@ and there is no infrastructure to help with this.
 </para>
 
 <para>
+Expanded <acronym>TOAST</> pointers are useful for complex data types
+whose on-disk representation is not especially suited for computational
+purposes.  As an example, the standard varlena representation of a
+<productname>PostgreSQL</> array includes dimensionality information, a
+nulls bitmap if there are any null elements, then the values of all the
+elements in order.  When the element type itself is variable-length, the
+only way to find the <replaceable>N</>'th element is to scan through all the
+preceding elements.  This representation is appropriate for on-disk storage
+because of its compactness, but for computations with the array it's much
+nicer to have an <quote>expanded</> or <quote>deconstructed</>
+representation in which all the element starting locations have been
+identified.  The <acronym>TOAST</> pointer mechanism supports this need by
+allowing a pass-by-reference Datum to point to either a standard varlena
+value (the on-disk representation) or a <acronym>TOAST</> pointer that
+points to an expanded representation somewhere in memory.  The details of
+this expanded representation are up to the data type, though it must have
+a standard header and meet the other API requirements given
+in <filename>src/include/utils/expandeddatum.h</>.  C-level functions
+working with the data type can choose to handle either representation.
+Functions that do not know about the expanded representation, but simply
+apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically
+receive the traditional varlena representation; so support for an expanded
+representation can be introduced incrementally, one function at a time.
+</para>
+
+<para>
+<acronym>TOAST</> pointers to expanded values are further broken down
+into <firstterm>read-write</> and <firstterm>read-only</> pointers.
+The pointed-to representation is the same either way, but a function that
+receives a read-write pointer is allowed to modify the referenced value
+in-place, whereas one that receives a read-only pointer must not; it must
+first create a copy if it wants to make a modified version of the value.
+This distinction and some associated conventions make it possible to avoid
+unnecessary copying of expanded values during query execution.
+</para>
+
+<para>
 For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
 management code ensures that no such pointer datum can accidentally get
 stored on disk.  In-memory <acronym>TOAST</> pointers are automatically
diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml
index 2459616281d..ac0b8a2943f 100644
--- a/doc/src/sgml/xtypes.sgml
+++ b/doc/src/sgml/xtypes.sgml
@@ -300,6 +300,77 @@ CREATE TYPE complex (
   </para>
  </note>
 
+ <para>
+  Another feature that's enabled by <acronym>TOAST</> support is the
+  possibility of having an <firstterm>expanded</> in-memory data
+  representation that is more convenient to work with than the format that
+  is stored on disk.  The regular or <quote>flat</> varlena storage format
+  is ultimately just a blob of bytes; it cannot for example contain
+  pointers, since it may get copied to other locations in memory.
+  For complex data types, the flat format may be quite expensive to work
+  with, so <productname>PostgreSQL</> provides a way to <quote>expand</>
+  the flat format into a representation that is more suited to computation,
+  and then pass that format in-memory between functions of the data type.
+ </para>
+
+ <para>
+  To use expanded storage, a data type must define an expanded format that
+  follows the rules given in <filename>src/include/utils/expandeddatum.h</>,
+  and provide functions to <quote>expand</> a flat varlena value into
+  expanded format and <quote>flatten</> the expanded format back to the
+  regular varlena representation.  Then ensure that all C functions for
+  the data type can accept either representation, possibly by converting
+  one into the other immediately upon receipt.  This does not require fixing
+  all existing functions for the data type at once, because the standard
+  <function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs
+  into regular flat format.  Therefore, existing functions that work with
+  the flat varlena format will continue to work, though slightly
+  inefficiently, with expanded inputs; they need not be converted until and
+  unless better performance is important.
+ </para>
+
+ <para>
+  C functions that know how to work with an expanded representation
+  typically fall into two categories: those that can only handle expanded
+  format, and those that can handle either expanded or flat varlena inputs.
+  The former are easier to write but may be less efficient overall, because
+  converting a flat input to expanded form for use by a single function may
+  cost more than is saved by operating on the expanded format.
+  When only expanded format need be handled, conversion of flat inputs to
+  expanded form can be hidden inside an argument-fetching macro, so that
+  the function appears no more complex than one working with traditional
+  varlena input.
+  To handle both types of input, write an argument-fetching function that
+  will detoast external, short-header, and compressed varlena inputs, but
+  not expanded inputs.  Such a function can be defined as returning a
+  pointer to a union of the flat varlena format and the expanded format.
+  Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to
+  determine which format they received.
+ </para>
+
+ <para>
+  The <acronym>TOAST</> infrastructure not only allows regular varlena
+  values to be distinguished from expanded values, but also
+  distinguishes <quote>read-write</> and <quote>read-only</> pointers to
+  expanded values.  C functions that only need to examine an expanded
+  value, or will only change it in safe and non-semantically-visible ways,
+  need not care which type of pointer they receive.  C functions that
+  produce a modified version of an input value are allowed to modify an
+  expanded input value in-place if they receive a read-write pointer, but
+  must not modify the input if they receive a read-only pointer; in that
+  case they have to copy the value first, producing a new value to modify.
+  A C function that has constructed a new expanded value should always
+  return a read-write pointer to it.  Also, a C function that is modifying
+  a read-write expanded value in-place should take care to leave the value
+  in a sane state if it fails partway through.
+ </para>
+
+ <para>
+  For examples of working with expanded values, see the standard array
+  infrastructure, particularly
+  <filename>src/backend/utils/adt/array_expanded.c</>.
+ </para>
+
  </sect2>
 
 </sect1>