diff options
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/storage.sgml | 42 | ||||
| -rw-r--r-- | doc/src/sgml/xtypes.sgml | 71 |
2 files changed, 111 insertions, 2 deletions
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index d8c52875d82..e5b7b4b68d0 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit. <acronym>TOAST</> pointers can point to data that is not on disk, but is elsewhere in the memory of the current server process. Such pointers obviously cannot be long-lived, but they are nonetheless useful. There -is currently just one sub-case: -pointers to <firstterm>indirect</> data. +are currently two sub-cases: +pointers to <firstterm>indirect</> data and +pointers to <firstterm>expanded</> data. </para> <para> @@ -519,6 +520,43 @@ and there is no infrastructure to help with this. </para> <para> +Expanded <acronym>TOAST</> pointers are useful for complex data types +whose on-disk representation is not especially suited for computational +purposes. As an example, the standard varlena representation of a +<productname>PostgreSQL</> array includes dimensionality information, a +nulls bitmap if there are any null elements, then the values of all the +elements in order. When the element type itself is variable-length, the +only way to find the <replaceable>N</>'th element is to scan through all the +preceding elements. This representation is appropriate for on-disk storage +because of its compactness, but for computations with the array it's much +nicer to have an <quote>expanded</> or <quote>deconstructed</> +representation in which all the element starting locations have been +identified. The <acronym>TOAST</> pointer mechanism supports this need by +allowing a pass-by-reference Datum to point to either a standard varlena +value (the on-disk representation) or a <acronym>TOAST</> pointer that +points to an expanded representation somewhere in memory. The details of +this expanded representation are up to the data type, though it must have +a standard header and meet the other API requirements given +in <filename>src/include/utils/expandeddatum.h</>. C-level functions +working with the data type can choose to handle either representation. +Functions that do not know about the expanded representation, but simply +apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically +receive the traditional varlena representation; so support for an expanded +representation can be introduced incrementally, one function at a time. +</para> + +<para> +<acronym>TOAST</> pointers to expanded values are further broken down +into <firstterm>read-write</> and <firstterm>read-only</> pointers. +The pointed-to representation is the same either way, but a function that +receives a read-write pointer is allowed to modify the referenced value +in-place, whereas one that receives a read-only pointer must not; it must +first create a copy if it wants to make a modified version of the value. +This distinction and some associated conventions make it possible to avoid +unnecessary copying of expanded values during query execution. +</para> + +<para> For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</> management code ensures that no such pointer datum can accidentally get stored on disk. In-memory <acronym>TOAST</> pointers are automatically diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml index 2459616281d..ac0b8a2943f 100644 --- a/doc/src/sgml/xtypes.sgml +++ b/doc/src/sgml/xtypes.sgml @@ -300,6 +300,77 @@ CREATE TYPE complex ( </para> </note> + <para> + Another feature that's enabled by <acronym>TOAST</> support is the + possibility of having an <firstterm>expanded</> in-memory data + representation that is more convenient to work with than the format that + is stored on disk. The regular or <quote>flat</> varlena storage format + is ultimately just a blob of bytes; it cannot for example contain + pointers, since it may get copied to other locations in memory. + For complex data types, the flat format may be quite expensive to work + with, so <productname>PostgreSQL</> provides a way to <quote>expand</> + the flat format into a representation that is more suited to computation, + and then pass that format in-memory between functions of the data type. + </para> + + <para> + To use expanded storage, a data type must define an expanded format that + follows the rules given in <filename>src/include/utils/expandeddatum.h</>, + and provide functions to <quote>expand</> a flat varlena value into + expanded format and <quote>flatten</> the expanded format back to the + regular varlena representation. Then ensure that all C functions for + the data type can accept either representation, possibly by converting + one into the other immediately upon receipt. This does not require fixing + all existing functions for the data type at once, because the standard + <function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs + into regular flat format. Therefore, existing functions that work with + the flat varlena format will continue to work, though slightly + inefficiently, with expanded inputs; they need not be converted until and + unless better performance is important. + </para> + + <para> + C functions that know how to work with an expanded representation + typically fall into two categories: those that can only handle expanded + format, and those that can handle either expanded or flat varlena inputs. + The former are easier to write but may be less efficient overall, because + converting a flat input to expanded form for use by a single function may + cost more than is saved by operating on the expanded format. + When only expanded format need be handled, conversion of flat inputs to + expanded form can be hidden inside an argument-fetching macro, so that + the function appears no more complex than one working with traditional + varlena input. + To handle both types of input, write an argument-fetching function that + will detoast external, short-header, and compressed varlena inputs, but + not expanded inputs. Such a function can be defined as returning a + pointer to a union of the flat varlena format and the expanded format. + Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to + determine which format they received. + </para> + + <para> + The <acronym>TOAST</> infrastructure not only allows regular varlena + values to be distinguished from expanded values, but also + distinguishes <quote>read-write</> and <quote>read-only</> pointers to + expanded values. C functions that only need to examine an expanded + value, or will only change it in safe and non-semantically-visible ways, + need not care which type of pointer they receive. C functions that + produce a modified version of an input value are allowed to modify an + expanded input value in-place if they receive a read-write pointer, but + must not modify the input if they receive a read-only pointer; in that + case they have to copy the value first, producing a new value to modify. + A C function that has constructed a new expanded value should always + return a read-write pointer to it. Also, a C function that is modifying + a read-write expanded value in-place should take care to leave the value + in a sane state if it fails partway through. + </para> + + <para> + For examples of working with expanded values, see the standard array + infrastructure, particularly + <filename>src/backend/utils/adt/array_expanded.c</>. + </para> + </sect2> </sect1> |
