summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/tsgistidx.c
AgeCommit message (Collapse)Author
2022-09-05Fix incorrect uses of Datum conversion macrosPeter Eisentraut
Since these macros just cast whatever you give them to the designated output type, and many normal uses also cast the output type further, a number of incorrect uses go undiscovered. The fixes in this patch have been discovered by changing these macros to inline functions, which is the subject of a future patch. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-07-01Change some unnecessary MemSet callsPeter Eisentraut
MemSet() with a value other than 0 just falls back to memset(), so the indirection is unnecessary if the value is constant and not 0. Since there is some interest in getting rid of MemSet(), this gets some easy cases out of the way. (There are a few MemSet() calls that I didn't change to maintain the consistency with their surrounding code.) Discussion: https://www.postgresql.org/message-id/flat/CAEudQApCeq4JjW1BdnwU=m=-DvG5WyUik0Yfn3p6UNphiHjj+w@mail.gmail.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-07-24Replace TS_execute's TS_EXEC_CALC_NOT flag with TS_EXEC_SKIP_NOT.Tom Lane
It's fairly silly that ignoring NOT subexpressions is TS_execute's default behavior. It's wrong on its face and it encourages errors of omission. Moreover, the only two remaining callers that aren't specifying CALC_NOT are in ts_headline calculations, and it's very arguable that those are bugs: if you've specified "!foo" in your query, why would you want to get a headline that includes "foo"? Hence, rip that out and change the default behavior to be to calculate NOT accurately. As a concession to the slim chance that there is still somebody somewhere who needs the incorrect behavior, provide a new SKIP_NOT flag to explicitly request that. Back-patch into v13, mainly because it seems better to change this at the same time as the previous commit's rejiggering of TS_execute related APIs. Any outside callers affected by this change are probably also affected by that one. Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
2020-07-24Fix assorted bugs by changing TS_execute's callback API to ternary logic.Tom Lane
Text search sometimes failed to find valid matches, for instance '!crew:A'::tsquery might fail to locate 'crew:1B'::tsvector during an index search. The root of the issue is that TS_execute's callback functions were not changed to use ternary (yes/no/maybe) reporting when we made the search logic itself do so. It's somewhat annoying to break that API, but on the other hand we now see that any code using plain boolean logic is almost certainly broken since the addition of phrase search. There seem to be very few outside callers of this code anyway, so we'll just break them intentionally to get them to adapt. This allows removal of tsginidx.c's private re-implementation of TS_execute, since that's now entirely duplicative. It's also no longer necessary to avoid use of CALC_NOT in tsgistidx.c, since the underlying callbacks can now do something reasonable. Back-patch into v13. We can't change this in stable branches, but it seems not quite too late to fix it in v13. Tom Lane and Pavel Borisov Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
2020-05-14Initial pgindent and pgperltidy run for v13.Tom Lane
Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
2020-03-30Implement operator class parametersAlexander Korotkov
PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-01-30Clean up newlines following left parenthesesAlvaro Herrera
We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-11-07Add reusable routine for making arrays unique.Thomas Munro
Introduce qunique() and qunique_arg(), which can be used after qsort() and qsort_arg() respectively to remove duplicate values. Use it where appropriate. Author: Thomas Munro Reviewed-by: Tom Lane (in an earlier version) Discussion: https://postgr.es/m/CAEepm%3D2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA%40mail.gmail.com
2019-09-05Split tuptoaster.c into three separate files.Robert Haas
detoast.c/h contain functions required to detoast a datum, partially or completely, plus a few other utility functions for examining the size of toasted datums. toast_internals.c/h contain functions that are used internally to the TOAST subsystem but which (mostly) do not need to be accessed from outside. heaptoast.c/h contains code that is intrinsically specific to the heap AM, either because it operates on HeapTuples or is based on the layout of a heap page. detoast.c and toast_internals.c are placed in src/backend/access/common rather than src/backend/access/heap. At present, both files still have dependencies on the heap, but that will be improved in a future commit. Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro, Andres Freund, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-02-15Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane
Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-11-08Change TRUE/FALSE to true/falsePeter Eisentraut
The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-09-19Remove no-op GiST support functions in the core GiST opclasses.Tom Lane
The preceding patch allowed us to remove useless GiST support functions. This patch actually does that for all the no-op cases in the core GiST code. This buys us whatever performance gain is to be had, and more importantly exercises the preceding patch. There remain no-op functions in the contrib GiST opclasses, but those will take more work to remove. Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com
2017-09-18Make DatumGetFoo/PG_GETARG_FOO/PG_RETURN_FOO macro names more consistent.Tom Lane
By project convention, these names should include "P" when dealing with a pointer type; that is, if the result of a GETARG macro is of type FOO *, it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer types such as JSONB and ranges had not followed the convention, and a number of contrib modules hadn't gotten that memo either. Rename the offending macros to improve consistency. In passing, fix a few places that thought PG_DETOAST_DATUM() returns a Datum; it does not, it returns "struct varlena *". Applying DatumGetPointer to that happens not to cause any bad effects today, but it's formally wrong. Also, adjust an ltree macro that was designed without any thought for what pgindent would do with it. This is all cosmetic and shouldn't have any impact on generated code. Mark Dilger, some further tweaks by me Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com
2017-09-05Remove unnecessary parentheses in return statementsPeter Eisentraut
The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
2017-01-17Generate fmgr prototypes automaticallyPeter Eisentraut
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains prototypes for all functions registered in pg_proc.h. This avoids having to manually maintain these prototypes across a random variety of header files. It also automatically enforces a correct function signature, and since there are warnings about missing prototypes, it will detect functions that are defined but not registered in pg_proc.h (or otherwise used). Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-12-21Fix strange behavior (and possible crashes) in full text phrase search.Tom Lane
In an attempt to simplify the tsquery matching engine, the original phrase search patch invented rewrite rules that would rearrange a tsquery so that no AND/OR/NOT operator appeared below a PHRASE operator. But this approach had numerous problems. The rearrangement step was missed by ts_rewrite (and perhaps other places), allowing tsqueries to be created that would cause Assert failures or perhaps crashes at execution, as reported by Andreas Seltenreich. The rewrite rules effectively defined semantics for operators underneath PHRASE that were buggy, or at least unintuitive. And because rewriting was done in tsqueryin() rather than at execution, the rearrangement was user-visible, which is not very desirable --- for example, it might cause unexpected matches or failures to match in ts_rewrite. As a somewhat independent problem, the behavior of nested PHRASE operators was only sane for left-deep trees; queries like "x <-> (y <-> z)" did not behave intuitively at all. To fix, get rid of the rewrite logic altogether, and instead teach the tsquery execution engine to manage AND/OR/NOT below a PHRASE operator by explicitly computing the match location(s) and match widths for these operators. This requires introducing some additional fields into the publicly visible ExecPhraseData struct; but since there's no way for third-party code to pass such a struct to TS_phrase_execute, it shouldn't create an ABI problem as long as we don't move the offsets of the existing fields. Another related problem was that index searches supposed that "!x <-> y" could be lossily approximated as "!x & y", which isn't correct because the latter will reject, say, "x q y" which the query itself accepts. This required some tweaking in TS_execute_ternary along with the main tsquery engine. Back-patch to 9.6 where phrase operators were introduced. While this could be argued to change behavior more than we'd like in a stable branch, we have to do something about the crash hazards and index-vs-seqscan inconsistency, and it doesn't seem desirable to let the unintuitive behaviors induced by the rewriting implementation stand as precedent. Discussion: https://postgr.es/m/28215.1481999808@sss.pgh.pa.us Discussion: https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
2016-06-27Do not fallback to AND for FTS phrase operator.Teodor Sigaev
If there is no positional information of lexemes then phrase operator will not fallback to AND operator. This change makes needing to modify TS_execute() interface, because somewhere (in indexes, for example) positional information is unaccesible and in this cases we need to force fallback to AND. Per discussion c19fcfec308e6ccd952cdde9e648b505@mail.gmail.com
2016-04-07Phrase full text search.Teodor Sigaev
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery. On-disk and binary in/out format of tsquery are backward compatible. It has two side effect: - change order for tsquery, so, users, who has a btree index over tsquery, should reindex it - less number of parenthesis in tsquery output, and tsquery becomes more readable Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov Reviewers: Alexander Korotkov, Artur Zakirov
2016-03-02Create stub functions to support pg_upgrade of old contrib/tsearch2.Tom Lane
Commits 9ff60273e35cad6e and dbe2328959e12701 adjusted the declarations of some core functions referenced by contrib/tsearch2's install script, forgetting that in a pg_upgrade situation, we'll be trying to restore operator class definitions that reference the old signatures. We've hit this problem before; solve it in the same way as before, namely by installing stub functions that have the expected signature and just invoke the correct function. Per report from Jeff Janes. (Someday we ought to stop supporting contrib/tsearch2, but I'm not sure today is that day.)
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-04-14Reorganize our CRC source files again.Heikki Linnakangas
Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-02-21Use FLEXIBLE_ARRAY_MEMBER in a number of other places.Tom Lane
I think we're about done with this...
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-11-04Switch to CRC-32C in WAL and other places.Heikki Linnakangas
The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.
2014-10-20Fix typos.Robert Haas
Etsuro Fujita
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-06-25Replace int2/int4 in C code with int16/int32Peter Eisentraut
The latter was already the dominant use, and it's preferable because in C the convention is that intXX means XX bits. Therefore, allowing mixed use of int2, int4, int8, int16, int32 is obviously confusing. Remove the typedefs for int2 and int4 for now. They don't seem to be widely used outside of the PostgreSQL source tree, and the few uses can probably be cleaned up by the time this ships.
2012-06-10Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian
commit-fest.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-09-11Remove many -Wcast-qual warningsPeter Eisentraut
This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-01-01Update copyright for 2009.Bruce Momjian
2008-05-16Extend GIN to support partial-match searches, and extend tsquery to supportTom Lane
prefix matching using this facility. Teodor Sigaev and Oleg Bartunov
2008-04-14Push index operator lossiness determination down to GIST/GIN opclassTom Lane
"consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-11-16Run pgindent on remaining files now that LOOPBYTE is a usable macro.Bruce Momjian
2007-11-16Modify LOOPBYTE/LOOPBIT macros to be more logical; rather than have theBruce Momjian
for() body passed as a parameter, make the macros act as simple headers to code blocks. This allows pgindent to be run on these files.
2007-09-11Refactor from Heikki Linnakangas <heikki@enterprisedb.com>:Teodor Sigaev
* Defined new struct WordEntryPosVector that holds a uint16 length and a variable size array of WordEntries. This replaces the previous convention of a variable size uint16 array, with the first element implying the length. WordEntryPosVector has the same layout in memory, but is more readable in source code. The POSDATAPTR and POSDATALEN macros are still used, though it would now be more readable to access the fields in WordEntryPosVector directly. * Removed needfree field from DocRepresentation. It was always set to false. * Miscellaneous other commenting and refactoring
2007-09-07Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> withTeodor Sigaev
small editorization by me - Brake the QueryItem struct into QueryOperator and QueryOperand. Type was really the only common field between them. QueryItem still exists, and is used in the TSQuery struct as before, but it's now a union of the two. Many other changes fell from that, like separation of pushval_asis function into pushValue, pushOperator and pushStop. - Moved some structs that were for internal use only from header files to the right .c-files. - Moved tsvector parser to a new tsvector_parser.c file. Parser code was about half of the size of tsvector.c, it's also used from tsquery.c, and it has some data structures of its own, so it seems better to separate it. Cleaned up the API so that TSVectorParserState is not accessed from outside tsvector_parser.c. - Separated enumerations (#defines, really) used for QueryItem.type field and as return codes from gettoken_query. It was just accidental code sharing. - Removed ParseQueryNode struct used internally by makepol and friends. push*-functions now construct QueryItems directly. - Changed int4 variables to just ints for variables like "i" or "array size", where the storage-size was not significant.