From 854823fa334cb826eed50da751801d0693b10173 Mon Sep 17 00:00:00 2001 From: Teodor Sigaev Date: Fri, 22 Dec 2017 13:33:16 +0300 Subject: Add optional compression method to SP-GiST Patch allows to have different types of column and value stored in leaf tuples of SP-GiST. The main application of feature is to transform complex column type to simple indexed type or for truncating too long value, transformation could be lossy. Simple example: polygons are converted to their bounding boxes, this opclass follows. Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov Reviewed-By: all authors + Darafei Praliaskouski Discussions: https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru --- doc/src/sgml/spgist.sgml | 92 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 72 insertions(+), 20 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml index 139c8ed8f74..b4a8be476e7 100644 --- a/doc/src/sgml/spgist.sgml +++ b/doc/src/sgml/spgist.sgml @@ -240,20 +240,22 @@ There are five user-defined methods that an index operator class for - SP-GiST must provide. All five follow the convention - of accepting two internal arguments, the first of which is a - pointer to a C struct containing input values for the support method, - while the second argument is a pointer to a C struct where output values - must be placed. Four of the methods just return void, since - all their results appear in the output struct; but + SP-GiST must provide, and one is optional. All five + mandatory methods follow the convention of accepting two internal + arguments, the first of which is a pointer to a C struct containing input + values for the support method, while the second argument is a pointer to a + C struct where output values must be placed. Four of the mandatory methods just + return void, since all their results appear in the output struct; but leaf_consistent additionally returns a boolean result. The methods must not modify any fields of their input structs. In all cases, the output struct is initialized to zeroes before calling the - user-defined method. + user-defined method. Optional sixth method compress + accepts datum to be indexed as the only argument and returns value suitable + for physical storage in leaf tuple. - The five user-defined methods are: + The five mandatory user-defined methods are: @@ -283,6 +285,7 @@ typedef struct spgConfigOut { Oid prefixType; /* Data type of inner-tuple prefixes */ Oid labelType; /* Data type of inner-tuple node labels */ + Oid leafType; /* Data type of leaf-tuple values */ bool canReturnData; /* Opclass can reconstruct original data */ bool longValuesOK; /* Opclass can cope with values > 1 page */ } spgConfigOut; @@ -305,6 +308,22 @@ typedef struct spgConfigOut class is capable of segmenting long values by repeated suffixing (see ). + + + leafType is typically the same as + attType. For the reasons of backward + compatibility, method config can + leave leafType uninitialized; that would + give the same effect as setting leafType equal + to attType. When attType + and leafType are different, then optional + method compress must be provided. + Method compress is responsible + for transformation of datums to be indexed from attType + to leafType. + Note: both consistent functions will get scankeys + unchanged, without transformation using compress. + @@ -380,10 +399,16 @@ typedef struct spgChooseOut } spgChooseOut; - datum is the original datum that was to be inserted - into the index. - leafDatum is initially the same as - datum, but can change at lower levels of the tree + datum is the original datum of + spgConfigIn.attType + type that was to be inserted into the index. + leafDatum is a value of + spgConfigOut.leafType + type which is initially an result of method + compress applied to datum + when method compress is provided, or same value as + datum otherwise. + leafDatum can change at lower levels of the tree if the choose or picksplit methods change it. When the insertion search reaches a leaf page, the current value of leafDatum is what will be stored @@ -418,7 +443,7 @@ typedef struct spgChooseOut Set levelAdd to the increment in level caused by descending through that node, or leave it as zero if the operator class does not use levels. - Set restDatum to equal datum + Set restDatum to equal leafDatum if the operator class does not modify datums from one level to the next, or otherwise set it to the modified value to be used as leafDatum at the next level. @@ -509,7 +534,9 @@ typedef struct spgPickSplitOut nTuples is the number of leaf tuples provided. - datums is an array of their datum values. + datums is an array of their datum values of + spgConfigOut.leafType + type. level is the current level that all the leaf tuples share, which will become the level of the new inner tuple. @@ -624,7 +651,8 @@ typedef struct spgInnerConsistentOut reconstructedValue is the value reconstructed for the parent tuple; it is (Datum) 0 at the root level or if the inner_consistent function did not provide a value at the - parent level. + parent level. reconstructedValue is always of + spgConfigOut.leafType type. traversalValue is a pointer to any traverse data passed down from the previous call of inner_consistent on the parent index tuple, or NULL at the root level. @@ -659,6 +687,7 @@ typedef struct spgInnerConsistentOut necessarily so, so an array is used.) If value reconstruction is needed, set reconstructedValues to an array of the values + of spgConfigOut.leafType type reconstructed for each child node to be visited; otherwise, leave reconstructedValues as NULL. If it is desired to pass down additional out-of-band information @@ -730,7 +759,8 @@ typedef struct spgLeafConsistentOut reconstructedValue is the value reconstructed for the parent tuple; it is (Datum) 0 at the root level or if the inner_consistent function did not provide a value at the - parent level. + parent level. reconstructedValue is always of + spgConfigOut.leafType type. traversalValue is a pointer to any traverse data passed down from the previous call of inner_consistent on the parent index tuple, or NULL at the root level. @@ -739,16 +769,18 @@ typedef struct spgLeafConsistentOut returnData is true if reconstructed data is required for this query; this will only be so if the config function asserted canReturnData. - leafDatum is the key value stored in the current - leaf tuple. + leafDatum is the key value of + spgConfigOut.leafType + stored in the current leaf tuple. The function must return true if the leaf tuple matches the query, or false if not. In the true case, if returnData is true then - leafValue must be set to the value originally supplied - to be indexed for this leaf tuple. Also, + leafValue must be set to the value of + spgConfigIn.attType type + originally supplied to be indexed for this leaf tuple. Also, recheck may be set to true if the match is uncertain and so the operator(s) must be re-applied to the actual heap tuple to verify the match. @@ -757,6 +789,26 @@ typedef struct spgLeafConsistentOut + + The optional user-defined method is: + + + + + Datum compress(Datum in) + + + Converts the data item into a format suitable for physical storage in + a leaf tuple of index page. It accepts + spgConfigIn.attType + value and return + spgConfigOut.leafType + value. Output value should not be toasted. + + + + + All the SP-GiST support methods are normally called in a short-lived memory context; that is, CurrentMemoryContext will be reset -- cgit v1.2.3