summaryrefslogtreecommitdiff
path: root/contrib/pg_trgm/trgm_gist.c
AgeCommit message (Collapse)Author
2025-02-20Remove various unnecessary (char *) castsPeter Eisentraut
Remove a number of (char *) casts that are unnecessary. Or in some cases, rewrite the code to make the purpose of the cast clearer. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-12Remove unnecessary (char *) casts [mem]Peter Eisentraut
Remove (char *) casts around memory functions such as memcmp(), memcpy(), or memset() where the cast is useless. Since these functions don't take char * arguments anyway, these casts are at best complicated casts to (void *), about which see commit 7f798aca1d5. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2024-11-28Remove useless casts to (void *)Peter Eisentraut
Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
2023-02-07Remove useless casts to (void *) in arguments of some system functionsPeter Eisentraut
The affected functions are: bsearch, memcmp, memcpy, memset, memmove, qsort, repalloc Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-01-10New header varatt.h split off from postgres.hPeter Eisentraut
This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
2022-12-10Standardize error reports in unimplemented I/O functions.Tom Lane
We chose a specific wording of the not-implemented errors for pseudotype I/O functions and other cases where there's little value in implementing input and/or output. gtsvectorin never got that memo though, nor did most of contrib. Make these all fall in line, mostly because I'm a neatnik but also to remove unnecessary translatable strings. gbtreekey_in needs a bit of extra love since it supports multiple SQL types. Sadly, gbtreekey_out doesn't have the ability to do that, but I think it's unreachable anyway. Noted while surveying datatype input functions to see what we have left to fix.
2022-07-01Change some unnecessary MemSet callsPeter Eisentraut
MemSet() with a value other than 0 just falls back to memset(), so the indirection is unnecessary if the value is constant and not 0. Since there is some interest in getting rid of MemSet(), this gets some easy cases out of the way. (There are a few MemSet() calls that I didn't change to maintain the consistency with their surrounding code.) Discussion: https://www.postgresql.org/message-id/flat/CAEudQApCeq4JjW1BdnwU=m=-DvG5WyUik0Yfn3p6UNphiHjj+w@mail.gmail.com
2020-11-15Handle equality operator in contrib/pg_trgmAlexander Korotkov
Obviously, in order to equality operator be satisfiable, target string must contain all the trigrams of the search string. On this base, we implement equality operator in GiST/GIN indexes with recheck. Discussion: https://postgr.es/m/CAOBaU_YWwtT7tdggtROacjdOdeYHCz-tmSwuC-j-TOG-g97J0w%40mail.gmail.com Author: Julien Rouhaud Reviewed-by: Tom Lane, Alexander Korotkov, Georgios Kokolatos, Erik Rijkers
2020-11-12pg_trgm: fix crash in 2-item picksplitAndrew Gierth
Whether from size overflow in gistSplit or from secondary splits, picksplit is (rarely) called with exactly two items to split. Formerly, due to special-case handling of the last item, this would lead to access to an uninitialized cache entry; prior to PG 13 this might have been harmless or at worst led to an incorrect union datum, but in 13 onwards it can cause a backend crash from using an uninitialized pointer. Repair by removing the special case, which was deemed not to have been appropriate anyway. Backpatch all the way, because this bug has existed since pg_trgm was added. Per report on IRC from user "ftzdomino". Analysis and testing by me, patch from Alexander Korotkov. Discussion: https://postgr.es/m/87k0usfdxg.fsf@news-spur.riddles.org.uk
2020-11-12Fix name of the macro for getting signature length trgm_gist.cAlexander Korotkov
911e702077 has introduced the opclass parameters including signature length for a set of GiST opclasses. Due to copy-pasting, macro for getting the signature length in trgm_gist.c was named LTREE_GET_ASIGLEN(). Fix that by renaming this macro to just GET_SIGLEN(). Backpatch-through: 13
2020-03-30Implement operator class parametersAlexander Korotkov
PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-01-30Clean up newlines following left parenthesesAlvaro Herrera
We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2019-10-24Make the order of the header file includes consistent in contrib modules.Amit Kapila
The basic rule we follow here is to always first include 'postgres.h' or 'postgres_fe.h' whichever is applicable, then system header includes and then Postgres header includes.  In this, we also follow that all the Postgres header includes are in order based on their ASCII value.  We generally follow these rules, but the code has deviated in many places. This commit makes it consistent just for contrib modules. The later commits will enforce similar rules in other parts of code. Author: Vignesh C Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-02-15Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane
Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2018-04-26Post-feature-freeze pgindent run.Tom Lane
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-03-21Add strict_word_similarity to pg_trgm moduleTeodor Sigaev
strict_word_similarity is similar to existing word_similarity function but it takes into account word boundaries to compute similarity. Author: Alexander Korotkov Review by: David Steele, Liudmila Mantrova, me Discussion: https://www.postgresql.org/message-id/flat/CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.com
2017-11-08Change TRUE/FALSE to true/falsePeter Eisentraut
The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-12Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch
This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
2016-06-20Fix comparison of similarity to threshold in GIST trigram searches.Tom Lane
There was some very strange code here, dating to commit b525bf77, that purported to work around an ancient gcc bug by forcing a float4 comparison to be done as int instead. Commit 5871b8848 broke that when it changed one side of the comparison to "double" but left the comparison code alone. Commit f576b17cd doubled down on the weirdness by introducing a "volatile" marker, which had nothing to do with the actual problem. Guess that the gcc bug, even if it's still present in the wild, was triggered by comparison of float4's and can be avoided if we store the result of cnt_sml() into a double before comparing to the double "nlimit". This will at least work correctly on non-broken compilers, and it's way more readable. Per bug #14202 from Greg Navis. Add a regression test based on his example. Report: <20160620115321.5792.10766@wrigleys.postgresql.org>
2016-06-09pgindent run for 9.6Robert Haas
2016-03-16Add word_similarity to pg_trgm contrib module.Teodor Sigaev
Patch introduces a concept of similarity over string and just a word from another string. Version of extension is not changed because 1.2 was already introduced in 9.6 release cycle, so, there wasn't a public version. Author: Alexander Korotkov, Artur Zakirov
2016-03-16GUC variable pg_trgm.similarity_threshold insead of set_limit()Teodor Sigaev
Use GUC variable pg_trgm.similarity_threshold insead of set_limit()/show_limit() which was introduced when defining GUC varuables by modules was absent. Author: Artur Zakirov
2015-05-15Move strategy numbers to include/access/stratnum.hAlvaro Herrera
For upcoming BRIN opclasses, it's convenient to have strategy numbers defined in a single place. Since there's nothing appropriate, create it. The StrategyNumber typedef now lives there, as well as existing strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from gist.h). skey.h is forced to include stratnum.h because of the StrategyNumber typedef, but gist.h is not; extensions that currently rely on gist.h for rtree strategy numbers might need to add a new A few .c files can stop including skey.h and/or gist.h, which is a nice side benefit. Per discussion: https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org Authored by Emre Hasegeli and Álvaro. (It's not clear to me why bootscanner.l has any #include lines at all.)
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-04-18Create function prototype as part of PG_FUNCTION_INFO_V1 macroPeter Eisentraut
Because of gcc -Wmissing-prototypes, all functions in dynamically loadable modules must have a separate prototype declaration. This is meant to detect global functions that are not declared in header files, but in cases where the function is called via dfmgr, this is redundant. Besides filling up space with boilerplate, this is a frequent source of compiler warnings in extension modules. We can fix that by creating the function prototype as part of the PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That makes the code of modules cleaner, because there is one less place where the entry points have to be listed, and creates an additional check that functions have the right prototype. Remove now redundant prototypes from contrib and other modules.
2013-04-15Improve GiST index search performance for trigram regex queries.Tom Lane
The initial coding just descended the index if any of the target trigrams were possibly present at the next level down. But actually we can apply trigramsMatchGraph() so as to take advantage of AND requirements when there are some. The input data might contain false positive matches, but that can only result in a false positive result, not false negative, so it's safe to do it this way. Alexander Korotkov
2013-04-10Make contrib/pg_trgm also support regex searches with GiST indexes.Tom Lane
This wasn't addressed in the original patch, but it doesn't take very much additional code to cover the case, so let's get it done. Since pg_trgm 1.1 hasn't been released yet, I just changed the definition of what's in it, rather than inventing a 1.2.
2012-06-25Replace int2/int4 in C code with int16/int32Peter Eisentraut
The latter was already the dominant use, and it's preferable because in C the convention is that intXX means XX bits. Therefore, allowing mixed use of int2, int4, int8, int16, int32 is obviously confusing. Remove the typedefs for int2 and int4 for now. They don't seem to be widely used outside of the PostgreSQL source tree, and the few uses can probably be cleaned up by the time this ships.
2012-06-10Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian
commit-fest.
2011-09-30Cache the result of makesign() across calls of gtrgm_penalty().Tom Lane
Since gtrgm_penalty() is usually called many times in a row with the same "newval" (to determine which item on an index page newval fits into best), the makesign() calculation is repetitious. It's expensive enough to make it worth caching the result, so do so. On my machine this is good for more than a 40% savings in the time needed to build a trigram index on /usr/share/dict/words. This is all per a suggestion of Heikki's. In passing, make some mostly-cosmetic improvements in the caching logic in the other functions in this file that rely on caching info in fn_extra.
2011-09-11Remove many -Wcast-qual warningsPeter Eisentraut
This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-01-31Support LIKE and ILIKE index searches via contrib/pg_trgm indexes.Tom Lane
Unlike Btree-based LIKE optimization, this works for non-left-anchored search patterns. The effectiveness of the search depends on how many trigrams can be extracted from the pattern. (The worst case, with no trigrams, degrades to a full-table scan, so this isn't a panacea. But it can be very useful.) Alexander Korotkov, reviewed by Jan Urbanski
2010-12-04Add KNNGIST support to contrib/pg_trgm.Tom Lane
Teodor Sigaev, with some revision by Tom
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2008-07-11Add caching of query to GIN/GiST consistent function.Teodor Sigaev
Per performance gripe from nomao.com
2008-05-17Add $PostgreSQL$ markers to a lot of files that were missing them.Andrew Dunstan
This particular batch was just for *.c and *.h file. The changes were made with the following 2 commands: find . \( \( -name 'libstemmer' -o -name 'expected' -o -name 'ppport.h' \) -prune \) -o \( -name '*.[ch]' \) \( -exec grep -q '\$PostgreSQL' {} \; -o -print \) | while read file ; do head -n 1 < $file | grep -q '^/\*' && echo $file; done | xargs -l sed -i -e '1s/^\// /' -e '1i/*\n * $PostgreSQL:$ \n *' find . \( \( -name 'libstemmer' -o -name 'expected' -o -name 'ppport.h' \) -prune \) -o \( -name '*.[ch]' \) \( -exec grep -q '\$PostgreSQL' {} \; -o -print \) | xargs -l sed -i -e '1i/*\n * $PostgreSQL:$ \n */'
2008-04-14Push index operator lossiness determination down to GIST/GIN opclassTom Lane
"consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.
2007-11-16Run pgindent on remaining files now that LOOPBYTE is a usable macro.Bruce Momjian
2007-11-16Modify LOOPBYTE/LOOPBIT macros to be more logical; rather than have theBruce Momjian
for() body passed as a parameter, make the macros act as simple headers to code blocks. This allows pgindent to be run on these files.
2007-04-06Support varlena fields with single-byte headers and unaligned storage.Tom Lane
This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane
2007-02-28Fix up several contrib modules that were using varlena datatypes in ↵Tom Lane
not-so-obvious ways. I'm not totally sure that I caught everything, but at least now they pass their regression tests with VARSIZE/SET_VARSIZE defined to reverse byte order.
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-06-28ChangesTeodor Sigaev
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l|r)datum for splitting. pickSplit should set spl_(l|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules
2006-03-01This patch makes the error message strings throughout the backendNeil Conway
more compliant with the error message style guide. In particular, errdetail should begin with a capital letter and end with a period, whereas errmsg should not. I also fixed a few related issues in passing, such as fixing the repeated misspelling of "lexeme" in contrib/tsearch2 (per Tom's suggestion).
2006-01-20Replace bitwise looping with bytewise looping in hemdistsign andTom Lane
sizebitvec of tsearch2, as well as identical code in several other contrib modules. This provided about a 20X speedup in building a large tsearch2 index ... didn't try to measure its effects for other operations. Thanks to Stephan Vollmer for providing a test case.
2005-11-07R-tree is dead ... long live GiST.Tom Lane
2005-05-21Cleanup of GiST extensions in contrib/: now that we always invoke GiSTNeil Conway
methods in a short-lived memory context, there is no need for GiST methods to do their own manual (and error-prone) memory management.