Age | Commit message (Collapse) | Author |
|
The CHECK_FOR_INTERRUPTS call in gingetbitmap turns out to be
inadequate to prevent a long uninterruptible loop, because
we now know a case where looping occurs within scanGetItem.
While the next patch will fix the bug that caused that, it
seems foolish to assume that no similar patterns are possible.
Let's do the CFI within scanGetItem's retry loop, instead.
This demonstrably allows canceling out of the loop exhibited
in bug #19031.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
|
|
Remove the TBMIterateResult member from the TBMPrivateIterator and
TBMSharedIterator and make tbm_[shared|private_]iterate() take a
TBMIterateResult as a parameter.
This allows tidbitmap API users to manage multiple TBMIterateResults per
scan. This is required for bitmap heap scan to use the read stream API,
with which there may be multiple I/Os in flight at once, each one with a
TBMIterateResult.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
|
|
If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.
The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry(). The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys. We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys. But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.
The problem in startScanKey() is the loop that attempts to identify
the first non-required search key. In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time. One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case). This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required. But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that. Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.
These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.
Bug: #18831
Reported-by: Niek <niek.brasa@hitachienergy.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18831-e845ac44ebc5dd36@postgresql.org
Backpatch-through: 13
|
|
Pages from the bitmap created by the TIDBitmap API can be exact or
lossy. The TIDBitmap API extracts the tuple offsets from exact pages
into an array for the convenience of the caller.
This was done in tbm_private|shared_iterate() right after advancing the
iterator. However, as long as tbm_private|shared_iterate() set a
reference to the PagetableEntry in the TBMIterateResult, the offset
extraction can be done later.
Waiting to extract the tuple offsets has a few benefits. For the shared
iterator case, it allows us to extract the offsets after dropping the
shared iterator state lock, reducing time spent holding a contended
lock.
Separating the iteration step and extracting the offsets later also
allows us to avoid extracting the offsets for prefetched blocks. Those
offsets were never used, so the overhead of extracting and storing them
was wasted.
The real motivation for this change, however, is that future commits
will make bitmap heap scan use the read stream API. This requires a
TBMIterateResult per issued block. By removing the array of tuple
offsets from the TBMIterateResult and only extracting the offsets when
they are used, we reduce the memory required for per buffer data
substantially.
Suggested-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
|
|
TBMIterateResult->ntuples is -1 when the page in the bitmap is lossy.
Add an explicit lossy indicator so that we can move ntuples out of the
TBMIterateResult in a future commit.
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
|
|
Consistently use "Size" (or size_t, or in some places int64 or double)
as the type for variables holding memory allocation sizes. In most
places variables' data types were fine already, but we had an ancient
habit of computing bytes from kilobytes-units GUCs with code like
"work_mem * 1024L". That risks overflow on Win64 where they did not
make "long" as wide as "size_t". We worked around that by restricting
such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB
on Win64. This patch removes that restriction, after replacing such
calculations with "work_mem * (Size) 1024" or variants of that.
It should be noted that this patch was constructed by searching
outwards from the GUCs that have MAX_KILOBYTES as upper limit.
So I can't positively guarantee there are no other places doing
memory-size arithmetic in int or long variables. I do however feel
pretty confident that increasing MAX_KILOBYTES on Win64 is safe now.
Also, nothing in our code should be dealing in multiple-gigabyte
allocations without authorization from a relevant GUC, so it seems
pretty likely that this search caught everything that could be at
risk of overflow.
Author: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
|
|
Backpatch-through: 13
|
|
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses. Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.
For some workloads, tuple deformation can be the most CPU intensive part
of processing the query. Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column. However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.
This also makes pg_attribute.attcacheoff redundant. A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.
Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
|
|
Add and use TBMPrivateIterator, which replaces the current TBMIterator
for serial use cases, and repurpose TBMIterator to be a unified
interface for both the serial ("private") and parallel ("shared") TID
Bitmap iterator interfaces. This encapsulation simplifies call sites for
callers supporting both parallel and serial TID Bitmap access.
TBMIterator is not yet used in this commit.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
Commit f691f5b8 removed the logic, but left behind some now-useless
Snapshot arguments to various AM-internal functions, and missed a couple
of comments.
Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com
|
|
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability. We agreed to remove it, after a long period
without an active plan to fix it.
This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
|
|
The ginfast.c code previously checked for conflicts in before locking
the relevant buffer, leaving a window where a RW conflict could be
missed. Re-order.
There was also a place where buffer ID and block number were confused
while trying to predicate-lock a page, noted by visual inspection.
Back-patch to all supported releases. Fixes one more problem discovered
with the reproducer from bug #17949, in this case when Dmitry tried
other index types.
Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reported-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
|
|
Backpatch-through: 11
|
|
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.
This fixes 63 warnings and leaves just 5.
Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
|
|
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas. Most fixes are related to some recent
additions, as of the development of v15.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
|
|
Backpatch-through: 10
|
|
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
|
|
Backpatch-through: 9.5
|
|
collectMatchBitmap() needs to re-find the index tuple it was previously
looking at, after transiently dropping lock on the index page it's on.
The tuple should still exist and be at its prior position or somewhere
to the right of that, since ginvacuum never removes tuples but
concurrent insertions could add one. However, there was a thinko in
that logic, to the effect of expecting any inserted tuples to have the
same index "attnum" as what we'd been scanning. Since there's no
physical separation of tuples with different attnums, it's not terribly
hard to devise scenarios where this fails, leading to transient "lost
saved point in index" errors. (While I've duplicated this with manual
testing, it seems impossible to make a reproducible test case with our
available testing technology.)
Fix by just continuing the scan when the attnum doesn't match.
While here, improve the error message used if we do fail, so that it
matches the wording used in btree for a similar case.
collectMatchBitmap()'s posting-tree code path was previously not
exercised at all by our regression tests. While I can't make
a regression test that exhibits the bug, I can at least improve
the code coverage here, so do that. The test case I made for this
is an extension of one added by 4b754d6c1, so it only works in
HEAD and v13; didn't seem worth trying hard to back-patch it.
Per bug #16595 from Jesse Kinkead. This has been broken since
multicolumn capability was added to GIN (commit 27cb66fdf),
so back-patch to all supported branches.
Discussion: https://postgr.es/m/16595-633118be8eef9ce2@postgresql.org
|
|
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.
The posting-tree path failed to advance "advancePast" after having
decided to filter an item. If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page. Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit. To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.
The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.
The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement. I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.
Refactor all three loops so that the termination conditions are
more alike and less unreadable.
The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.
Back-patch to all supported branches.
Adé Heyward and Tom Lane
Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
|
|
The strategy of GIN index scan is driven by opclass-specific extract_query
method. This method that needed search mode is GIN_SEARCH_MODE_ALL. This
mode means that matching tuple may contain none of extracted entries. Simple
example is '!term' tsquery, which doesn't need any term to exist in matching
tsvector.
In order to handle such scan key GIN calculates virtual entry, which contains
all TIDs of all entries of attribute. In fact this is full scan of index
attribute. And typically this is very slow, but allows to handle some queries
correctly in GIN. However, current algorithm calculate such virtual entry for
each GIN_SEARCH_MODE_ALL scan key even if they are multiple for the same
attribute. This is clearly not optimal.
This commit improves the situation by introduction of "exclude only" scan keys.
Such scan keys are not capable to return set of matching TIDs. Instead, they
are capable only to filter TIDs produced by normal scan keys. Therefore,
each attribute should contain at least one normal scan key, while rest of them
may be "exclude only" if search mode is GIN_SEARCH_MODE_ALL.
The same optimization might be applied to the whole scan, not per-attribute.
But that leads to NULL values elimination problem. There is trade-off between
multiple possible ways to do this. We probably want to do this later using
some cost-based decision algorithm.
Discussion: https://postgr.es/m/CAOBaU_YGP5-BEt5Cc0%3DzMve92vocPzD%2BXiZgiZs1kjY0cj%3DXBg%40mail.gmail.com
Author: Nikita Glukhov, Alexander Korotkov, Tom Lane, Julien Rouhaud
Reviewed-by: Julien Rouhaud, Tomas Vondra, Tom Lane
|
|
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
Spotted mostly by Fabien Coelho.
Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1901230947050.16643@lancre
|
|
Backpatch-through: certain files through 9.4
|
|
According to README we acquire predicate locks on entry tree leafs and posting
tree roots. However, when ginFindLeafPage() is going to lock leaf in exclusive
mode, then it checks root for conflicts regardless whether it's a entry or
posting tree. Assuming that we never place predicate lock on entry tree root
(excluding corner case when root is leaf), this check is redundant. This
commit removes this check. Now, root conflict checking is controlled by
separate argument of ginFindLeafPage().
Discussion: https://postgr.es/m/CAPpHfdv7rrDyy%3DMgsaK-L9kk0AH7az0B-mdC3w3p0FSb9uoyEg%40mail.gmail.com
Author: Alexander Korotkov
Backpatch-through: 11
|
|
|
|
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.
This fixes two bugs:
1. If fast update was turned on concurrently, subsequent inserts to the
pending list would not conflict with predicate locks that were acquired
earlier, on entry pages. The included 'predicate-gin-fastupdate' test
demonstrates that. To fix, make all scans acquire a predicate lock on
the metapage. That lock represents a scan of the pending list, whether
or not there is a pending list at the moment. Forget about the
optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
point of predicate locks is to lock the gabs between values, whether
or not there is a match. The included 'predicate-gin-nomatch' test
tests that case.
In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)
Also, some spelling fixes.
Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
|
|
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
|
|
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.
Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
|
|
Backpatch-through: certain files through 9.3
|
|
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources. The upper case spellings are only used
in some files/modules. So standardize on the standard spellings.
The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.
In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
|
|
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc. Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.
Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
|
|
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
|
|
There are no functional changes here; this simply encapsulates knowledge
of the ItemPointerData struct so that a future patch can change things
without more breakage.
All direct users of ip_blkid and ip_posid are changed to use existing
macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber
respectively. For callers where that's inappropriate (because they
Assert that the itempointer is is valid-looking), add
ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck,
which lack the assertion but are otherwise identical.
Author: Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
|
|
When a shared iterator is used, each call to tbm_shared_iterate()
returns a result that has not yet been returned to any process
attached to the shared iterator. In other words, each cooperating
processes gets a disjoint subset of the full result set, but all
results are returned exactly once.
This is infrastructure for parallel bitmap heap scan.
Dilip Kumar. The larger patch set of which this is a part has been
reviewed and tested by (at least) Andres Freund, Amit Khandekar,
Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
|
|
|
|
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature. Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming). The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.
This change should have little or no effect on generated executable
code.
Motivated by the back-patching pain of Tom Lane and Robert Haas
|
|
The code had a query-lifespan memory leak when encountering GIN entries
that have posting lists (rather than posting trees, ie, there are a
relatively small number of heap tuples containing this index key value).
With a suitable data distribution this could add up to a lot of leakage.
Problem seems to have been introduced by commit 36a35c550, so back-patch
to 9.4.
Julien Rouhaud
|
|
This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.
This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
|
|
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in. It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point. This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third. The follow-on patch will change the places where the test
needs to be made.
|
|
Commit d88976cfa1302e8d removed this code from ginFreeScanKeys():
- if (entry->list)
- pfree(entry->list);
evidently in the belief that that ItemPointer array is allocated in the
keyCtx and so would be reclaimed by the following MemoryContextReset.
Unfortunately, it isn't and it won't. It'd likely be a good idea for
that to become so, but as a simple and back-patchable fix in the
meantime, restore this code to ginFreeScanKeys().
Also, add a similar pfree to where startScanEntry() is about to zero out
entry->list. I am not sure if there are any code paths where this
change prevents a leak today, but it seems like cheap future-proofing.
In passing, make the initial allocation of so->entries[] use palloc
not palloc0. The code doesn't depend on unused entries being zero;
if it did, the array-enlargement code in ginFillScanEntry() would be
wrong. So using palloc0 initially can only serve to confuse readers
about what the invariant is.
Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
<CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
|
|
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
|
|
Backpatch certain files through 9.1
|
|
|
|
It was getting tedious to track and release all the different things that
form a scan key. We were leaking at least the queryCategories array, and
possibly more, on a rescan. That was visible if a GIN index was used in a
nested loop join. This also protects from leaks in extractQuery method.
No backpatching, given the lack of complaints from the field. Maybe later,
after this has received more field testing.
|
|
The requiredEntries / additionalEntries arrays were not freed in
freeScanKeys() like other per-key stuff.
It's not obvious, but startScanKey() was only ever called after the keys
have been initialized with ginNewScanKey(). That's why it doesn't need to
worry about freeing existing arrays. The ginIsNewKey() test in gingetbitmap
was never true, because ginrescan free's the existing keys, and it's not OK
to call gingetbitmap twice in a row without calling ginrescan in between.
To make that clear, remove the unnecessary ginIsNewKey(). And just to be
extra sure that nothing funny happens if there is an existing key after all,
call freeScanKeys() to free it if it exists. This makes the code more
straightforward.
(I'm seeing other similar leaks in testing a query that rescans an GIN index
scan, but that's a different issue. This just fixes the obvious leak with
those two arrays.)
Backpatch to 9.4, where GIN fast scan was added.
|
|
When gin_fuzzy_search_limit was used, we could jump out of startScan()
without calling startScanKey(). That was harmless in 9.3 and below, because
startScanKey()() didn't do anything interesting, but in 9.4 it initializes
information needed for skipping entries (aka GIN fast scans), and you
readily get a segfault if it's not done. Nevertheless, it was clearly wrong
all along, so backpatch all the way to 9.1 where the early return was
introduced.
(AFAICS startScanKey() did nothing useful in 9.3 and below, because the
fields it initialized were already initialized in ginFillScanKey(), but I
don't dare to change that in a minor release. ginFillScanKey() is always
called in gingetbitmap() even though there's a check there to see if the
scan keys have already been initialized, because they never are; ginrescan()
free's them.)
In the passing, remove unnecessary if-check from the second inner loop in
startScan(). We already check in the first loop that the condition is true
for all entries.
Reported by Olaf Gawenda, bug #12694, Backpatch to 9.1 and above, although
AFAICS it causes a live bug only in 9.4.
|
|
Backpatch certain files through 9.0
|
|
When returning rows from a bitmap, as done with partial match queries, we
would get stuck in an infinite loop if the bitmap contained a lossy page
reference.
This bug is new in master, it was introduced by the patch to allow skipping
items refuted by other entries in GIN scans.
Report and fix by Alexander Korotkov
|