summaryrefslogtreecommitdiff
path: root/src/backend/access/spgist/spgtextproc.c
AgeCommit message (Collapse)Author
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-09-04Remove lc_collate_is_c().Jeff Davis
Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
2024-03-13Make the order of the header file includes consistentPeter Eisentraut
Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-02-16Use new overflow-safe integer comparison functions.Nathan Bossart
Commit 6b80394781 introduced integer comparison functions designed to be as efficient as possible while avoiding overflow. This commit makes use of these functions in many of the in-tree qsort() comparators to help ensure transitivity. Many of these comparator functions should also see a small performance boost. Author: Mats Kindahl Reviewed-by: Andres Freund, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-01-10New header varatt.h split off from postgres.hPeter Eisentraut
This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-05-22Initial pgindent run for v12.Tom Lane
This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-03-22Collations with nondeterministic comparisonPeter Eisentraut
This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-04-16Fix broken collation-aware searches in SP-GiST text opclass.Tom Lane
spg_text_leaf_consistent() supposed that it should compare only Min(querylen, entrylen) bytes of the two strings, and then deal with any excess bytes in one string or the other by assuming the longer string is greater if the prefixes are equal. Quite aside from the fact that that's just wrong in some locales (e.g., 'ch' is not less than 'd' in cs_CZ), it also risked passing incomplete multibyte characters to strcoll(), with ensuing bad results. Instead, just pass the full strings to varstr_cmp, and let it decide what to do about unequal-length strings. Fortunately, this error doesn't imply any index corruption, it's just that searches might return the wrong set of entries. Per report from Emre Hasegeli, though this is not his patch. Thanks to Peter Geoghegan for review and discussion. This code was born broken, so back-patch to all supported branches. In HEAD, I failed to resist the temptation to do a bit of cosmetic cleanup/pgindent'ing on 710d90da1, too. Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com
2018-04-03Add prefix operator for TEXT type.Teodor Sigaev
The prefix operator along with SP-GiST indexes can be used as an alternative for LIKE 'word%' commands and it doesn't have a limitation of string/prefix length as B-Tree has. Bump catalog version Author: Ildus Kurbangaliev with some editorization by me Review by: Arthur Zakirov, Alexander Korotkov, and me Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-03-12Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch
This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
2017-01-20Move some things from builtins.h to new header filesPeter Eisentraut
This avoids that builtins.h has to include additional header files.
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-08-23Improve SP-GiST opclass API to better support unlabeled nodes.Tom Lane
Previously, the spgSplitTuple action could only create a new upper tuple containing a single labeled node. This made it useless for opclasses that prefer to work with fixed sets of nodes (labeled or otherwise), which meant that restrictive prefixes could not be used with such node definitions. Change the output field set for the choose() method to allow it to specify any valid node set for the new upper tuple, and to specify which of these nodes to place the modified lower tuple in. In addition to its primary use for fixed node sets, this feature could allow existing opclasses that use variable node sets to skip a separate spgAddNode action when splitting a tuple, by setting up the node needed for the incoming value as part of the spgSplitTuple action. However, care would have to be taken to add the extra node only when it would not make the tuple bigger than before. (spgAddNode can enlarge the tuple, spgSplitTuple can't.) This is a prerequisite for an upcoming SP-GiST inet opclass, but is being committed separately to increase the visibility of the API change. In passing, improve the documentation about the traverse-values feature that was added by commit ccd6eb49a. Emre Hasegeli, with cosmetic adjustments and documentation rework by me Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>
2016-01-02Fix overly-strict assertions in spgtextproc.c.Tom Lane
spg_text_inner_consistent is capable of reconstructing an empty string to pass down to the next index level; this happens if we have an empty string coming in, no prefix, and a dummy node label. (In practice, what is needed to trigger that is insertion of a whole bunch of empty-string values.) Then, we will arrive at the next level with in->level == 0 and a non-NULL (but zero length) in->reconstructedValue, which is valid but the Assert tests weren't expecting it. Per report from Andreas Seltenreich. This has no impact in non-Assert builds, so should not be a problem in production, but back-patch to all affected branches anyway. In passing, remove a couple of useless variable initializations and shorten the code by not duplicating DatumGetPointer() calls.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-06-09Fix infinite loop when splitting inner tuples in SPGiST text indexes.Tom Lane
Previously, the code used a node label of zero both for strings that contain no bytes beyond the inner tuple's prefix, and for cases where an "allTheSame" inner tuple has to be split to allow a string with a different next byte to be inserted into it. Failing to distinguish these cases meant that if a string ending with the current prefix needed to be inserted into an allTheSame tuple, we got into an infinite loop, because after splitting the tuple we'd descend into the child allTheSame tuple and then find we need to split again. To fix, instead use -1 and -2 as the node labels for these two cases. This requires widening the node label type from "char" to int2, but fortunately SPGiST stores all pass-by-value node label types in their Datum representation, which means that this change is transparently upward compatible so far as the on-disk representation goes. We continue to recognize zero as a dummy node label for reading purposes, but will not attempt to push new index entries down into such a label, so that the loop won't occur even when dealing with an existing index. Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty code was introduced.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-09-18Typo fix.Robert Haas
Etsuro Fujita
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-05-08Use the term "radix tree" instead of "suffix tree" for SP-GiST text opclass.Heikki Linnakangas
What we have implemented is a radix tree (or a radix trie or a patricia trie), but the docs and code comments incorrectly called it a "suffix tree". Alexander Korotkov
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-06-26Cope with smaller-than-normal BLCKSZ setting in SPGiST indexes on text.Tom Lane
The original coding failed miserably for BLCKSZ of 4K or less, as reported by Josh Kupershmidt. With the present design for text indexes, a given inner tuple could have up to 256 labels (requiring either 3K or 4K bytes depending on MAXALIGN), which means that we can't positively guarantee no failures for smaller blocksizes. But we can at least make it behave sanely so long as there are few enough labels to fit on a page. Considering that btree is also more prone to "index tuple too large" failures when BLCKSZ is small, it's not clear that we should expend more work than this on this case.
2012-06-10Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian
commit-fest.
2012-03-10Restructure SPGiST opclass interface API to support whole-index scans.Tom Lane
The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-12-19Teach SP-GiST to do index-only scans.Tom Lane
Operator classes can specify whether or not they support this; this preserves the flexibility to use lossy representations within an index. In passing, move constant data about a given index into the rd_amcache cache area, instead of doing fresh lookups each time we start an index operation. This is mainly to try to make sure that spgcanreturn() has insignificant cost; I still don't have any proof that it matters for actual index accesses. Also, get rid of useless copying of FmgrInfo pointers; we can perfectly well use the relcache's versions in-place.
2011-12-17Add SP-GiST (space-partitioned GiST) index access method.Tom Lane
SP-GiST is comparable to GiST in flexibility, but supports non-balanced partitioned search structures rather than balanced trees. As described at PGCon 2011, this new indexing structure can beat GiST in both index build time and query speed for search problems that it is well matched to. There are a number of areas that could still use improvement, but at this point the code seems committable. Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane