summaryrefslogtreecommitdiff
path: root/src/backend/access/gin
AgeCommit message (Collapse)Author
2019-03-24Fix WAL format incompatibility introduced by backpatching of 52ac6cd2d0Alexander Korotkov
52ac6cd2d0 added new field to ginxlogDeletePage and was backpatched to 9.4. That led to problems when patched postgres instance applies WAL records generated by non-patched one. WAL records generated by non-patched instance don't contain new field, which patched one is expecting to see. Thankfully, we can distinguish patched and non-patched WAL records by their data size. If we see that WAL record is generated by non-patched instance, we skip processing of new field. This commit comes with some assertions. In particular, if it appears that on some platform struct data size didn't change then static assertion will trigger. Reported-by: Simon Riggs Discussion: https://postgr.es/m/CANP8%2Bj%2BK4whxf7ET7%2BgO%2BG-baC3-WxqqH%3DnV4X2CgfEPA3Yu3g%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Simon Riggs, Alvaro Herrera Backpatch-through: 9.4
2018-12-13Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov
When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
2018-12-13Prevent deadlock in ginRedoDeletePage()Alexander Korotkov
On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
2018-09-09Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov
ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
2018-09-01Avoid using potentially-under-aligned page buffers.Tom Lane
There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-07-19Fix handling of empty uncompressed posting list pages in GINAlexander Korotkov
PostgreSQL 9.4 introduces posting list compression in GIN. This feature supports online upgrade, so that after pg_upgrade uncompressed posting lists are compressed on-the-fly. Underlying code appears to always expect at least one item on uncompressed posting list page. But there could be completely empty pages, because VACUUM never deletes leftmost and rightmost pages from posting trees. This commit fixes that. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com Author: Sivasubramanian Ramasubramanian, Alexander Korotkov Backpatch-through: 9.4
2016-09-03Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware.Tom Lane
computeLeafRecompressWALData() tried to produce a uint16 WAL log field by memcpy'ing the first two bytes of an int-sized variable. That accidentally works on little-endian hardware, but not at all on big-endian. Replay then thinks it's looking at an ADDITEMS action with zero entries, and reads the first two bytes of the first TID therein as the next segno/action, typically leading to "unexpected GIN leaf action" errors during replay. Even if replay failed to crash, the resulting GIN index page would surely be incorrect. To fix, just declare the variable as uint16 instead. Per bug #14295 from Spencer Thomason (much thanks to Spencer for turning his problem into a self-contained test case). This likely also explains a previous report of the same symptom from Bernd Helmle. Back-patch to 9.4 where the problem was introduced (by commit 14d02f0bb). Discussion: <20160826072658.15676.7628@wrigleys.postgresql.org> Possible-Report: <2DA7350F7296B2A142272901@eje.land.credativ.lan>
2016-04-20Fix memory leak and other bugs in ginPlaceToPage() & subroutines.Tom Lane
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <571276DD.5050303@dalibo.com>
2016-04-15Fix memory leak in GIN index scans.Tom Lane
The code had a query-lifespan memory leak when encountering GIN entries that have posting lists (rather than posting trees, ie, there are a relatively small number of heap tuples containing this index key value). With a suitable data distribution this could add up to a lot of leakage. Problem seems to have been introduced by commit 36a35c550, so back-patch to 9.4. Julien Rouhaud
2016-03-13Fix memory leak in repeated GIN index searches.Tom Lane
Commit d88976cfa1302e8d removed this code from ginFreeScanKeys(): - if (entry->list) - pfree(entry->list); evidently in the belief that that ItemPointer array is allocated in the keyCtx and so would be reclaimed by the following MemoryContextReset. Unfortunately, it isn't and it won't. It'd likely be a good idea for that to become so, but as a simple and back-patchable fix in the meantime, restore this code to ginFreeScanKeys(). Also, add a similar pfree to where startScanEntry() is about to zero out entry->list. I am not sure if there are any code paths where this change prevents a leak today, but it seems like cheap future-proofing. In passing, make the initial allocation of so->entries[] use palloc not palloc0. The code doesn't depend on unused entries being zero; if it did, the array-enlargement code in ginFillScanEntry() would be wrong. So using palloc0 initially can only serve to confuse readers about what the invariant is. Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in <CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
2015-09-07Make GIN's cleanup pending list process interruptableTeodor Sigaev
Cleanup process could be called by ordinary insert/update and could take a lot of time. Add vacuum_delay_point() to make this process interruptable. Under vacuum this call will also throttle a vacuum process to decrease system load, called from insert/update it will not throttle, and that reduces a latency. Backpatch for all supported branches. Jeff Janes <jeff.janes@gmail.com>
2015-09-05Fix misc typos.Heikki Linnakangas
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-07-27Reuse all-zero pages in GIN.Heikki Linnakangas
In GIN, an all-zeros page would be leaked forever, and never reused. Just add them to the FSM in vacuum, and they will be reinitialized when grabbed from the FSM. On master and 9.5, attempting to access the page's opaque struct also caused an assertion failure, although that was otherwise harmless. Reported by Jeff Janes. Backpatch to all supported versions.
2015-06-30Initialize GIN metapage correctly when replaying metapage-update WAL record.Heikki Linnakangas
I broke this with my WAL format refactoring patch. Before that, the metapage was read from disk, and modified in-place regardless of the LSN. That was always a bit silly, as there's no need to read the old page version from disk disk when we're overwriting it anyway. So that was changed in 9.5, but I failed to add a GinInitPage call to initialize the page-headers correctly. Usually you wouldn't notice, because the metapage is already in the page cache and is not zeroed. One way to reproduce this is to perform a VACUUM on an already vacuumed table (so that the vacuum has no real work to do), immediately after a checkpoint, and then perform an immediate shutdown. After recovery, the page headers of the metapage will be incorrectly all-zeroes. Reported by Jeff Janes
2015-06-28Fix double-XLogBeginInsert call in GIN page splits.Heikki Linnakangas
If data checksums or wal_log_hints is on, and a GIN page is split, the code to find a new, empty, block was called after having already called XLogBeginInsert(). That causes an assertion failure or PANIC, if finding the new block involves updating a FSM page that had not been modified since last checkpoint, because that update is WAL-logged, which calls XLogBeginInsert again. Nested XLogBeginInsert calls are not supported. To fix, rearrange GIN code so that XLogBeginInsert is called later, after finding the victim buffers. Reported by Jeff Janes.
2015-05-23pgindent run for 9.5Bruce Momjian
2015-05-20Collection of typo fixes.Heikki Linnakangas
Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
2015-05-17Fix typos in commentsMagnus Hagander
Dmitriy Olshevskiy
2015-05-15Move strategy numbers to include/access/stratnum.hAlvaro Herrera
For upcoming BRIN opclasses, it's convenient to have strategy numbers defined in a single place. Since there's nothing appropriate, create it. The StrategyNumber typedef now lives there, as well as existing strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from gist.h). skey.h is forced to include stratnum.h because of the StrategyNumber typedef, but gist.h is not; extensions that currently rely on gist.h for rtree strategy numbers might need to add a new A few .c files can stop including skey.h and/or gist.h, which is a nice side benefit. Per discussion: https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org Authored by Emre Hasegeli and Álvaro. (It's not clear to me why bootscanner.l has any #include lines at all.)
2015-03-29Make ginbuild's funcCtx be independent of its tmpCtx.Tom Lane
Previously the funcCtx was a child of the tmpCtx, but that was broken by commit eaa5808e8ec4e82ce1a87103a6b6f687666e4e4c, which made MemoryContextReset() delete, not reset, child contexts. The behavior of having a tmpCtx reset also clear the other context seems rather dubious anyway, so let's just disentangle them. Per report from Erik Rijkers. In passing, fix badly-inaccurate comments about these contexts.
2015-03-12Fix memory leaks in GIN index vacuum.Heikki Linnakangas
Per bug #12850 by Walter Nordmann. Backpatch to 9.4 where the leak was introduced.
2015-02-04Use a separate memory context for GIN scan keys.Heikki Linnakangas
It was getting tedious to track and release all the different things that form a scan key. We were leaking at least the queryCategories array, and possibly more, on a rescan. That was visible if a GIN index was used in a nested loop join. This also protects from leaks in extractQuery method. No backpatching, given the lack of complaints from the field. Maybe later, after this has received more field testing.
2015-01-30Fix query-duration memory leak with GIN rescans.Heikki Linnakangas
The requiredEntries / additionalEntries arrays were not freed in freeScanKeys() like other per-key stuff. It's not obvious, but startScanKey() was only ever called after the keys have been initialized with ginNewScanKey(). That's why it doesn't need to worry about freeing existing arrays. The ginIsNewKey() test in gingetbitmap was never true, because ginrescan free's the existing keys, and it's not OK to call gingetbitmap twice in a row without calling ginrescan in between. To make that clear, remove the unnecessary ginIsNewKey(). And just to be extra sure that nothing funny happens if there is an existing key after all, call freeScanKeys() to free it if it exists. This makes the code more straightforward. (I'm seeing other similar leaks in testing a query that rescans an GIN index scan, but that's a different issue. This just fixes the obvious leak with those two arrays.) Backpatch to 9.4, where GIN fast scan was added.
2015-01-29Fix bug where GIN scan keys were not initialized with gin_fuzzy_search_limit.Heikki Linnakangas
When gin_fuzzy_search_limit was used, we could jump out of startScan() without calling startScanKey(). That was harmless in 9.3 and below, because startScanKey()() didn't do anything interesting, but in 9.4 it initializes information needed for skipping entries (aka GIN fast scans), and you readily get a segfault if it's not done. Nevertheless, it was clearly wrong all along, so backpatch all the way to 9.1 where the early return was introduced. (AFAICS startScanKey() did nothing useful in 9.3 and below, because the fields it initialized were already initialized in ginFillScanKey(), but I don't dare to change that in a minor release. ginFillScanKey() is always called in gingetbitmap() even though there's a check there to see if the scan keys have already been initialized, because they never are; ginrescan() free's them.) In the passing, remove unnecessary if-check from the second inner loop in startScan(). We already check in the first loop that the condition is true for all entries. Reported by Olaf Gawenda, bug #12694, Backpatch to 9.1 and above, although AFAICS it causes a live bug only in 9.4.
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-11-21No need to call XLogEnsureRecordSpace when the relation is unlogged.Heikki Linnakangas
Amit Kapila
2014-11-20Revamp the WAL record format.Heikki Linnakangas
Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
2014-11-13Rename pending_list_cleanup_size to gin_pending_list_limit.Fujii Masao
Since this parameter is only for GIN index, it's better to add "gin" to the parameter name for easier understanding.
2014-11-11Add GUC and storage parameter to set the maximum size of GIN pending list.Fujii Masao
Previously the maximum size of GIN pending list was controlled only by work_mem. But the reasonable value of work_mem and the reasonable size of the list are basically not the same, so it was not appropriate to control both of them by only one GUC, i.e., work_mem. This commit separates new GUC, pending_list_cleanup_size, from work_mem to allow users to control only the size of the list. Also this commit adds pending_list_cleanup_size as new storage parameter to allow users to specify the size of the list per index. This is useful, for example, when users want to increase the size of the list only for the GIN index which can be updated heavily, and decrease it otherwise. Reviewed by Etsuro Fujita.
2014-11-06Move the backup-block logic from XLogInsert to a new file, xloginsert.c.Heikki Linnakangas
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-09-12Fix GIN data page split ratio calculation.Heikki Linnakangas
The code that tried to split a page at 75/25 ratio, when appending to the end of an index, was buggy in two ways. First, there was a silly typo that caused it to just fill the left page as full as possible. But the logic as it was intended wasn't correct either, and would actually have given a ratio closer to 60/40 than 75/25. Gaetano Mendola spotted the typo. Backpatch to 9.4, where this code was added.
2014-09-02Refactor per-page logic common to all redo routines to a new function.Heikki Linnakangas
Every redo routine uses the same idiom to determine what to do to a page: check if there's a backup block for it, and if not read, the buffer if the block exists, and check its LSN. Refactor that into a common function, XLogReadBufferForRedo, making all the redo routines shorter and more readable. This has no user-visible effect, and makes no changes to the WAL format. Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
2014-08-29Fix bug in compressed GIN data leaf page splitting code.Heikki Linnakangas
The list of posting lists it's dealing with can contain placeholders for deleted posting lists. The placeholders are kept around so that they can be WAL-logged, but we must be careful to not try to access them. This fixes bug #11280, reported by Mårten Svantesson. Backpatch to 9.4, where the compressed data leaf page code was added.
2014-07-31Move log_newpage and log_newpage_buffer to xlog.c.Heikki Linnakangas
log_newpage is used by many indexams, in addition to heap, but for historical reasons it's always been part of the heapam rmgr. Starting with 9.3, we have another WAL record type for logging an image of a page, XLOG_FPI. Simplify things by moving log_newpage and log_newpage_buffer to xlog.c, and switch to using the XLOG_FPI record type. Bump the WAL version number because the code to replay the old HEAP_NEWPAGE records is removed.
2014-06-20Don't allow to disable backend assertions via the debug_assertions GUC.Andres Freund
The existance of the assert_enabled variable (backing the debug_assertions GUC) reduced the amount of knowledge some static code checkers (like coverity and various compilers) could infer from the existance of the assertion. That could have been solved by optionally removing the assertion_enabled variable from the Assert() et al macros at compile time when some special macro is defined, but the resulting complication doesn't seem to be worth the gain from having debug_assertions. Recompiling is fast enough. The debug_assertions GUC is still available, but readonly, as it's useful when diagnosing problems. The commandline/client startup option -A, which previously also allowed to enable/disable assertions, has been removed as it doesn't serve a purpose anymore. While at it, reduce code duplication in bufmgr.c and localbuf.c assertions checking for spurious buffer pins. That code had to be reindented anyway to cope with the assert_enabled removal.
2014-05-10Fix bug in lossy-page handling in GINHeikki Linnakangas
When returning rows from a bitmap, as done with partial match queries, we would get stuck in an infinite loop if the bitmap contained a lossy page reference. This bug is new in master, it was introduced by the patch to allow skipping items refuted by other entries in GIN scans. Report and fix by Alexander Korotkov
2014-05-08Protect against torn pages when deleting GIN list pages.Heikki Linnakangas
To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-04-28Fix two bugs in WAL-logging of GIN pending-list pages.Heikki Linnakangas
In writeListPage, never take a full-page image of the page, because we have all the information required to re-initialize in the WAL record anyway. Before this fix, a full-page image was always generated, unless full_page_writes=off, because when the page is initialized its LSN is always 0. In stable-branches, keep the code to restore the backup blocks if they exist, in case that the WAL is generated with an older minor version, but in master Assert that there are no full-page images. In the redo routine, add missing "off++". Otherwise the tuples are added to the page in reverse order. That happens to be harmless because we always scan and remove all the tuples together, but it was clearly wrong. Also, it was masked by the first bug unless full_page_writes=off, because the page was always restored from a full-page image. Backpatch to all supported versions.
2014-04-22Fix Gin README.Heikki Linnakangas
The README incorrectly claimed that GIN posting tree pages contain an array of uncompressed items in addition to compressed posting lists. Earlier versions of the GIN posting list compression patch worked that way, but not the one that was committed.
2014-04-20Fix typo.Robert Haas
Etsuro Fujita
2014-04-14Set pd_lower on internal GIN posting tree pages.Heikki Linnakangas
This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
2014-04-14Remove dead checks for invalid left page in ginDeletePage.Heikki Linnakangas
In some places, the function assumes the left page is valid, and in others, it checks if it is valid. Remove all the checks.
2014-04-14GIN entry pages follow the standard page layout - tell XLogInsert.Heikki Linnakangas
The entry B-tree pages all follow the standard page layout. The 9.3 code has this right. I inadvertently changed this at some point during the big refactorings in git master.
2014-04-10Fix bugs in GIN "fast scan" with partial match.Heikki Linnakangas
There were a couple of bugs here. First, if the fuzzy limit was exceeded, the loop in entryGetItem might drop out too soon if a whole block needs to be skipped because it's < advancePast ("continue" in a while-loop checks the loop condition too). Secondly, the loop checked when stepping to a new page that there is at least one offset on the page < advancePast, but we cannot rely on that on subsequent calls of entryGetItem, because advancePast might change in between. That caused the skipping loop to read bogus items in the TbmIterateResult's offset array. First item and fix by Alexander Korotkov, second bug pointed out by Fabrízio de Royes Mello, by a small variation of Alexander's test query.
2014-04-07Zero padding byte at end of GIN posting list.Heikki Linnakangas
This isn't strictly necessary, but helps debugging.
2014-04-07Fix WAL replay bug in the new GIN incomplete-split code.Heikki Linnakangas
Forgot to set the incomplete-split flag on the left page half, in redo of a page split. Spotted this by comparing the page contents on master and standby, after inserting/applying each WAL record.
2014-04-05Fix another palloc in critical section.Heikki Linnakangas
Also add a regression test for a GIN index with enough items with the same key, so that a GIN posting tree gets created. Apparently none of the existing GIN tests were large enough for that. This code is new, no backpatching required.
2014-04-01Fix bug in the new GIN incomplete-split code.Heikki Linnakangas
Inserting a downlink to an internal page clears the incomplete-split flag of the child's left sibling, so the left sibling's LSN also needs to be updated and it needs to be marked dirty. The codepath for an insertion got this right, but the case where the internal node is split because of inserting the new downlink missed that.
2014-04-01Remove dead check for backup block, replace with Assert.Heikki Linnakangas
We don't use backup blocks with GIN vacuum records anymore, the page is always recreated from scratch.