From 0d779d22a290a89b6c892137a37280b9588ad0cc Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 16 Feb 2021 12:07:14 -0500 Subject: Convert tsginidx.c's GIN indexing logic to fully ternary operation. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 2f2007fbb did this partially, but there were two remaining warts. checkcondition_gin handled some uncertain cases by setting the out-of-band recheck flag, some by returning TS_MAYBE, and some by doing both. Meanwhile, TS_execute arbitrarily converted a TS_MAYBE result to TS_YES. Thus, if checkcondition_gin chose to only return TS_MAYBE, the outcome would be TS_YES with no recheck flag, potentially resulting in wrong query outputs. The case where this'd happen is if there were GIN_MAYBE entries in the indexscan results passed to gin_tsquery_[tri]consistent, which so far as I can see would only happen if the tidbitmap used to accumulate indexscan results grew large enough to become lossy. I initially thought of fixing this by ensuring we always set the recheck flag as well as returning TS_MAYBE in uncertain cases. But that errs in the other direction, potentially forcing rechecks of rows that provably match the query (since the recheck flag remains set even if TS_execute later finds that the answer must be TS_YES). Instead, let's get rid of the out-of-band recheck flag altogether and rely on returning TS_MAYBE. This requires exporting a version of TS_execute that will actually return the full ternary result of the evaluation ... but we likely should have done that to start with. Unfortunately it doesn't seem practical to add a regression test case that covers this: the amount of data needed to cause the GIN bitmap to become lossy results in a longer runtime than I think we want to have in the tests. (I'm wondering about allowing smaller work_mem settings to ameliorate that, but it'd be a matter for a separate patch.) Per bug #16865 from Dimitri NĂ¼scheler. Back-patch to v13 where the faulty commit came in. Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org --- src/backend/utils/adt/tsvector_op.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'src/backend/utils/adt/tsvector_op.c') diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 756a48a167a..69fdd2fe36f 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -1854,6 +1854,18 @@ TS_execute(QueryItem *curitem, void *arg, uint32 flags, return TS_execute_recurse(curitem, arg, flags, chkcond) != TS_NO; } +/* + * Evaluate tsquery boolean expression. + * + * This is the same as TS_execute except that TS_MAYBE is returned as-is. + */ +TSTernaryValue +TS_execute_ternary(QueryItem *curitem, void *arg, uint32 flags, + TSExecuteCallback chkcond) +{ + return TS_execute_recurse(curitem, arg, flags, chkcond); +} + /* * TS_execute recursion for operators above any phrase operator. Here we do * not need to worry about lexeme positions. As soon as we hit an OP_PHRASE -- cgit v1.2.3