summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/pgstatfuncs.c
diff options
context:
space:
mode:
authorTomas Vondra <tomas.vondra@postgresql.org>2023-05-19 00:00:22 +0200
committerTomas Vondra <tomas.vondra@postgresql.org>2023-05-19 00:15:00 +0200
commit40d465cafce0825539a02edeb72966555766fc18 (patch)
tree75b5e3ee08e83dc3d96f6b7a06ba8f46ba639808 /src/backend/utils/adt/pgstatfuncs.c
parent3f1356e5d60a310b9ceb73497ec49af1b69bafa8 (diff)
Fix handling of empty ranges and NULLs in BRIN
BRIN indexes did not properly distinguish between summaries for empty (no rows) and all-NULL ranges, treating them as essentially the same thing. Summaries were initialized with allnulls=true, and opclasses simply reset allnulls to false when processing the first non-NULL value. This however produces incorrect results if the range starts with a NULL value (or a sequence of NULL values), in which case we forget the range contains NULL values when adding the first non-NULL value. This happens because the allnulls flag is used for two separate purposes - to mark empty ranges (not representing any rows yet) and ranges containing only NULL values. Opclasses don't know which of these cases it is, and so don't know whether to set hasnulls=true. Setting the flag in both cases would make it correct, but it would also make BRIN indexes useless for queries with IS NULL clauses. All ranges start empty (and thus allnulls=true), so all ranges would end up with either allnulls=true or hasnulls=true. The severity of the issue is somewhat reduced by the fact that it only happens when adding values to an existing summary with allnulls=true. This can happen e.g. for small tables (because a summary for the first range exists for all BRIN indexes), or for tables with large fraction of NULL values in the indexed columns. Bulk summarization (e.g. during CREATE INDEX or automatic summarization) that processes all values at once is not affected by this issue. In this case the flags were updated in a slightly different way, not forgetting the NULL values. To identify empty ranges we use a new flag, stored in an unused bit in the BRIN tuple header so the on-disk format remains the same. A matching flag is added to BrinMemTuple, into a 3B gap after bt_placeholder. That means there's no risk of ABI breakage, although we don't actually pass the BrinMemTuple to any public API. We could also skip storing index tuples for empty summaries, but then we'd have to always process such ranges - even if there are no rows in large parts of the table (e.g. after a bulk DELETE), it would still require reading the pages etc. So we store them, but ignore them when building the bitmap. Backpatch to 11. The issue exists since BRIN indexes were introduced in 9.5, but older releases are already EOL. Backpatch-through: 11 Reviewed-by: Justin Pryzby, Matthias van de Meent, Alvaro Herrera Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b@enterprisedb.com
Diffstat (limited to 'src/backend/utils/adt/pgstatfuncs.c')
0 files changed, 0 insertions, 0 deletions