From 3e23b68dac006e8deb0afa327e855258df8de064 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 6 Apr 2007 04:21:44 +0000
Subject: Support varlena fields with single-byte headers and unaligned
 storage.

This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
---
 doc/src/sgml/storage.sgml | 57 ++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

(limited to 'doc/src')
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1973a5b90c3..9c3cf7589da 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.16 2007/04/03 04:14:26 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.17 2007/04/06 04:21:41 tgl Exp $ -->
 
 <chapter id="storage">
 
@@ -210,18 +210,27 @@ value, but in some cases more efficient approaches are possible.)
 </para>
 
 <para>
-<acronym>TOAST</> usurps the high-order two bits of the varlena length word,
+<acronym>TOAST</> usurps two bits of the varlena length word (the high-order
+bits on big-endian machines, the low-order bits on little-endian machines),
 thereby limiting the logical size of any value of a <acronym>TOAST</>-able
 data type to 1 GB (2<superscript>30</> - 1 bytes).  When both bits are zero,
-the value is an ordinary un-<acronym>TOAST</>ed value of the data type.  One
-of these bits, if set, indicates that the value has been compressed and must
-be decompressed before use.  The other bit, if set, indicates that the value
-has been stored out-of-line.  In this case the remainder of the value is
-actually just a pointer, and the correct data has to be found elsewhere.  When
-both bits are set, the out-of-line data has been compressed too.  In each case
-the length in the low-order bits of the varlena word indicates the actual size
-of the datum, not the size of the logical value that would be extracted by
-decompression or fetching of the out-of-line data.
+the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and
+the remaining bits of the length word give the total datum size (including
+length word) in bytes.  When the highest-order or lowest-order bit is set,
+the value has only a single-byte header instead of the normal four-byte
+header, and the remaining bits give the total datum size (including length
+byte) in bytes.  As a special case, if the remaining bits are all zero
+(which would be impossible for a self-inclusive length), the value is a
+pointer to out-of-line data stored in a separate TOAST table.  (The size of
+a TOAST pointer is known a priori, so it doesn't need to be represented in
+the header.)  Values with single-byte headers aren't aligned on any particular
+boundary, either.  Lastly, when the highest-order or lowest-order bit is
+clear but the adjacent bit is set, the content of the datum has been
+compressed and must be decompressed before use.  In this case the remaining
+bits of the length word give the total size of the compressed datum, not the
+original data.  Note that compression is also possible for out-of-line data
+but the varlena header does not tell whether it has occurred &mdash;
+the content of the TOAST pointer tells that, instead.
 </para>
 
 <para>
@@ -254,8 +263,8 @@ retrieval of the values.  A pointer datum representing an out-of-line
 <acronym>TOAST</> table in which to look and the OID of the specific value
 (its <structfield>chunk_id</>).  For convenience, pointer datums also store the
 logical datum size (original uncompressed data length) and actual stored size
-(different if compression was applied).  Allowing for the varlena header word,
-the total size of a <acronym>TOAST</> pointer datum is therefore 20 bytes
+(different if compression was applied).  Allowing for the varlena header byte,
+the total size of a <acronym>TOAST</> pointer datum is therefore 17 bytes
 regardless of the actual size of the represented value.
 </para>
 
@@ -280,7 +289,9 @@ The <acronym>TOAST</> code recognizes four different strategies for storing
     <listitem>
      <para>
       <literal>PLAIN</literal> prevents either compression or
-      out-of-line storage.  This is the only possible strategy for
+      out-of-line storage; furthermore it disables use of single-byte headers
+      for varlena types.
+      This is the only possible strategy for
       columns of non-<acronym>TOAST</>-able data types.
      </para>
     </listitem>
@@ -562,7 +573,7 @@ data. Empty in ordinary tables.</entry>
  <para>
 
   All table rows are structured in the same way. There is a fixed-size
-  header (occupying 27 bytes on most machines), followed by an optional null
+  header (occupying 23 bytes on most machines), followed by an optional null
   bitmap, an optional object ID field, and the user data. The header is
   detailed
   in <xref linkend="heaptupleheaderdata-table">.  The actual user data
@@ -604,12 +615,6 @@ data. Empty in ordinary tables.</entry>
    <entry>4 bytes</entry>
    <entry>insert XID stamp</entry>
   </row>
-  <row>
-   <entry>t_cmin</entry>
-   <entry>CommandId</entry>
-   <entry>4 bytes</entry>
-   <entry>insert CID stamp</entry>
-  </row>
   <row>
    <entry>t_xmax</entry>
    <entry>TransactionId</entry>
@@ -617,10 +622,10 @@ data. Empty in ordinary tables.</entry>
    <entry>delete XID stamp</entry>
   </row>
   <row>
-   <entry>t_cmax</entry>
+   <entry>t_cid</entry>
    <entry>CommandId</entry>
    <entry>4 bytes</entry>
-   <entry>delete CID stamp (overlays with t_xvac)</entry>
+   <entry>insert and/or delete CID stamp (overlays with t_xvac)</entry>
   </row>
   <row>
    <entry>t_xvac</entry>
@@ -635,10 +640,10 @@ data. Empty in ordinary tables.</entry>
    <entry>current TID of this or newer row version</entry>
   </row>
   <row>
-   <entry>t_natts</entry>
+   <entry>t_infomask2</entry>
    <entry>int16</entry>
    <entry>2 bytes</entry>
-   <entry>number of attributes</entry>
+   <entry>number of attributes, plus various flag bits</entry>
   </row>
   <row>
    <entry>t_infomask</entry>
@@ -682,7 +687,7 @@ data. Empty in ordinary tables.</entry>
   fixed width field, then all the bytes are simply placed. If it's a
   variable length field (attlen = -1) then it's a bit more complicated.
   All variable-length datatypes share the common header structure
-  <type>varattrib</type>, which includes the total length of the stored
+  <type>struct varlena</type>, which includes the total length of the stored
   value and some flag bits.  Depending on the flags, the data can be either
   inline or in a <acronym>TOAST</> table;
   it might be compressed, too (see <xref linkend="storage-toast">).
-- 
cgit v1.2.3