From 3e23b68dac006e8deb0afa327e855258df8de064 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 6 Apr 2007 04:21:44 +0000 Subject: Support varlena fields with single-byte headers and unaligned storage. This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane --- doc/src/sgml/storage.sgml | 57 ++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 26 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index 1973a5b90c3..9c3cf7589da 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -1,4 +1,4 @@ - + @@ -210,18 +210,27 @@ value, but in some cases more efficient approaches are possible.) -TOAST usurps the high-order two bits of the varlena length word, +TOAST usurps two bits of the varlena length word (the high-order +bits on big-endian machines, the low-order bits on little-endian machines), thereby limiting the logical size of any value of a TOAST-able data type to 1 GB (230 - 1 bytes). When both bits are zero, -the value is an ordinary un-TOASTed value of the data type. One -of these bits, if set, indicates that the value has been compressed and must -be decompressed before use. The other bit, if set, indicates that the value -has been stored out-of-line. In this case the remainder of the value is -actually just a pointer, and the correct data has to be found elsewhere. When -both bits are set, the out-of-line data has been compressed too. In each case -the length in the low-order bits of the varlena word indicates the actual size -of the datum, not the size of the logical value that would be extracted by -decompression or fetching of the out-of-line data. +the value is an ordinary un-TOASTed value of the data type, and +the remaining bits of the length word give the total datum size (including +length word) in bytes. When the highest-order or lowest-order bit is set, +the value has only a single-byte header instead of the normal four-byte +header, and the remaining bits give the total datum size (including length +byte) in bytes. As a special case, if the remaining bits are all zero +(which would be impossible for a self-inclusive length), the value is a +pointer to out-of-line data stored in a separate TOAST table. (The size of +a TOAST pointer is known a priori, so it doesn't need to be represented in +the header.) Values with single-byte headers aren't aligned on any particular +boundary, either. Lastly, when the highest-order or lowest-order bit is +clear but the adjacent bit is set, the content of the datum has been +compressed and must be decompressed before use. In this case the remaining +bits of the length word give the total size of the compressed datum, not the +original data. Note that compression is also possible for out-of-line data +but the varlena header does not tell whether it has occurred — +the content of the TOAST pointer tells that, instead. @@ -254,8 +263,8 @@ retrieval of the values. A pointer datum representing an out-of-line TOAST table in which to look and the OID of the specific value (its chunk_id). For convenience, pointer datums also store the logical datum size (original uncompressed data length) and actual stored size -(different if compression was applied). Allowing for the varlena header word, -the total size of a TOAST pointer datum is therefore 20 bytes +(different if compression was applied). Allowing for the varlena header byte, +the total size of a TOAST pointer datum is therefore 17 bytes regardless of the actual size of the represented value. @@ -280,7 +289,9 @@ The TOAST code recognizes four different strategies for storing PLAIN prevents either compression or - out-of-line storage. This is the only possible strategy for + out-of-line storage; furthermore it disables use of single-byte headers + for varlena types. + This is the only possible strategy for columns of non-TOAST-able data types. @@ -562,7 +573,7 @@ data. Empty in ordinary tables. All table rows are structured in the same way. There is a fixed-size - header (occupying 27 bytes on most machines), followed by an optional null + header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data. The header is detailed in . The actual user data @@ -604,12 +615,6 @@ data. Empty in ordinary tables. 4 bytes insert XID stamp - - t_cmin - CommandId - 4 bytes - insert CID stamp - t_xmax TransactionId @@ -617,10 +622,10 @@ data. Empty in ordinary tables. delete XID stamp - t_cmax + t_cid CommandId 4 bytes - delete CID stamp (overlays with t_xvac) + insert and/or delete CID stamp (overlays with t_xvac) t_xvac @@ -635,10 +640,10 @@ data. Empty in ordinary tables. current TID of this or newer row version - t_natts + t_infomask2 int16 2 bytes - number of attributes + number of attributes, plus various flag bits t_infomask @@ -682,7 +687,7 @@ data. Empty in ordinary tables. fixed width field, then all the bytes are simply placed. If it's a variable length field (attlen = -1) then it's a bit more complicated. All variable-length datatypes share the common header structure - varattrib, which includes the total length of the stored + struct varlena, which includes the total length of the stored value and some flag bits. Depending on the flags, the data can be either inline or in a TOAST table; it might be compressed, too (see ). -- cgit v1.2.3