summaryrefslogtreecommitdiff
path: root/src/backend/access/gin
AgeCommit message (Collapse)Author
2014-01-24Reset unused fields in GIN data leaf page footer.Heikki Linnakangas
The maxoff field is not used in the new, compressed page format. Let's reset it when converting an old-format page to the new format. The code won't care either way, but this makes it possible to use the field for something else in the future.
2014-01-24Fix off-by-one in newly-introdcued GIN assertion.Heikki Linnakangas
Spotted by Alexander Korotkov
2014-01-24In GIN recompression code, use mmemove rather than memcpy, for vacuum.Heikki Linnakangas
When vacuuming a data leaf page, any compressed posting lists that are not modified, are copied back to the buffer from a later location in the same buffer rather than from a palloc'd copy. IOW, they are just moved downwards in the same buffer. Because the source and destination addresses can overlap, we must use memmove rather than memcpy. Report and fix by Alexander Korotkov.
2014-01-23Allow use of "z" flag in our printf calls, and use it where appropriate.Tom Lane
Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me
2014-01-23Fix alignment of GIN in-line posting lists stored in entry tuples.Heikki Linnakangas
The Sparc machines in the buildfarm are crashing because of misaligned access to posting lists stored in entry tuples. I accidentally removed a critical SHORTALIGN() from ginFormTuple, as part of the packed posting lists patch. Perhaps I thought it was unnecessary, because the index_form_tuple() call above the SHORTALIGN already aligned the size, missing the fact that the null-category byte makes it misaligned again (I think the SHORTALIGN is indeed unnecessary if there's no null- category byte, but let's just play it safe...)
2014-01-23Silence compiler warning.Heikki Linnakangas
Not all compilers understand that elog(ERROR, ...) never returns.
2014-01-22Fix declaration of GinVacuumState.Heikki Linnakangas
gcc 4.8 was happy with having a duplicate typedef, but most compilers seem not to be, per buildfarm.
2014-01-22Compress GIN posting lists, for smaller index size.Heikki Linnakangas
GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-12-16Mark variables 'static' where possible. Move GinFuzzySearchLimit to ginget.cHeikki Linnakangas
Per "clang -Wmissing-variable-declarations" output, posted by Andres Freund. I didn't silence all those warnings, though, only the most obvious cases.
2013-12-04Don't include unused space in LOG_NEWPAGE records.Heikki Linnakangas
This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.
2013-12-03Fix full-page writes of internal GIN pages.Heikki Linnakangas
Insertion to a non-leaf GIN page didn't make a full-page image of the page, which is wrong. The code used to do it correctly, but was changed (commit 853d1c3103fa961ae6219f0281885b345593d101) because the redo-routine didn't track incomplete splits correctly when the page was restored from a full page image. Of course, that was not right way to fix it, the redo routine should've been fixed instead. The redo-routine was surreptitiously fixed in 2010 (commit 4016bdef8aded77b4903c457050622a5a1815c16), so all we need to do now is revert the code that creates the record to its original form. This doesn't change the format of the WAL record. Backpatch to all supported versions.
2013-11-27Get rid of the post-recovery cleanup step of GIN page splits.Heikki Linnakangas
Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.
2013-11-27More GIN refactoring.Heikki Linnakangas
Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.
2013-11-20More GIN refactoring.Heikki Linnakangas
Split off the portion of ginInsertValue that inserts the tuple to current level into a separate function, ginPlaceToPage. ginInsertValue's charter is now to recurse up the tree to insert the downlink, when a page split is required. This is in preparation for a patch to change the way incomplete splits are handled, which will need to do these operations separately. And IMHO makes the code more readable anyway.
2013-11-20Refactor the internal GIN B-tree interface for forming a downlink.Heikki Linnakangas
This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.
2013-11-20Further GIN refactoring.Heikki Linnakangas
Merge some functions that were always called together. Makes the code little bit more readable.
2013-11-13Fix bug in GIN posting tree root creation.Heikki Linnakangas
The root page is filled with as many items as fit, and the rest are inserted using normal insertions. However, I fumbled the variable names, and the code actually memcpy'd all the items on the page, overflowing the buffer. While at it, rename the variable to make the distinction more clear. Reported by Teodor Sigaev. This bug was introduced by my recent refactorings, so no backpatching required.
2013-11-08Fix race condition in GIN posting tree page deletion.Heikki Linnakangas
If a page is deleted, and reused for something else, just as a search is following a rightlink to it from its left sibling, the search would continue scanning whatever the new contents of the page are. That could lead to incorrect query results, or even something more curious if the page is reused for a different kind of a page. To fix, modify the search algorithm to lock the next page before releasing the previous one, and refrain from deleting pages from the leftmost branch of the tree. Add a new Concurrency section to the README, explaining why this works. There is a lot more one could say about concurrency in GIN, but that's for another patch. Backpatch to all supported versions.
2013-11-07Fix setting of right bound at GIN page split.Heikki Linnakangas
Broken by my refactoring.
2013-11-06Fix missing argument and function prototypes.Heikki Linnakangas
Not sure how I missed these in previous commit.
2013-11-06Misc GIN refactoring.Heikki Linnakangas
Merge the isEnoughSpace and placeToPage functions in the b-tree interface into one function that tries to put a tuple on page, and returns false if it doesn't fit. Move createPostingTree function to gindatapage.c, and change its contract so that it can be passed more items than fit on the root page. It's in a better position than the callers to know how many items fit. Move ginMergeItemPointers out of gindatapage.c, into a separate file. These changes make no difference now, but reduce the footprint of Alexander Korotkov's upcoming patch to pack item pointers more tightly.
2013-10-03Minor GIN code refactoring.Heikki Linnakangas
It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.
2013-06-29Inline ginCompareItemPointers function for speed.Heikki Linnakangas
ginCompareItemPointers function is called heavily in gin index scans - inlining it speeds up some kind of queries a lot.
2013-06-26Initialize pad bytes in GinFormTuple().Noah Misch
Every other core buffer page consumer initializes the bytes it furnishes to PageAddItem(). For consistency, do the same here. No back-patch; regardless, we couldn't count on the fix so long as binary upgrade can carry forward affected index builds.
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-03-18Remove PageSetTLI and rename pd_tli to pd_checksumSimon Riggs
Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-12-28Remove obsolete XLogRecPtr macrosAlvaro Herrera
This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
2012-11-28Split out rmgr rm_desc functions into their own filesAlvaro Herrera
This is necessary (but not sufficient) to have them compilable outside of a backend environment.
2012-11-12Fix multiple problems in WAL replay.Tom Lane
Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
2012-08-28Split heapam_xlog.h from heapam.hAlvaro Herrera
The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
2012-07-16Remove unreachable codePeter Eisentraut
The Solaris Studio compiler warns about these instances, unlike more mainstream compilers such as gcc. But manual inspection showed that the code is clearly not reachable, and we hope no worthy compiler will complain about removing this code.
2012-07-13Cosmetic cleanup of ginInsertValue().Tom Lane
Make it clearer that the passed stack mustn't be empty, and that we are not supposed to fall off the end of the stack in the main loop. Tighten the loop that extracts the root block number, too. Markus Wanner and Tom Lane
2012-06-14Add new function log_newpage_buffer.Robert Haas
When I implemented the ginbuildempty() function as part of implementing unlogged tables, I falsified the note in the header comment for log_newpage. Although we could fix that up by changing the comment, it seems cleaner to add a new function which is specifically intended to handle this case. So do that.
2012-04-23Lots of doc corrections.Robert Haas
Josh Kupershmidt
2012-04-06Fix misleading output from gin_desc().Tom Lane
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.
2012-02-26Fix some more bugs in GIN's WAL replay logic.Tom Lane
In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.
2012-02-04Improve comment.Tom Lane
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-11-25Fix erroneous replay of GIN_UPDATE_META_PAGE WAL records.Tom Lane
A simple thinko in ginRedoUpdateMetapage, namely failing to increment a loop counter, led to inserting records into the last pending-list page in the wrong order (the opposite of that intended). So far as I can tell, this would not upset the code that eventually flushes pending items into the main part of the GIN index. But it did break the code that searched the pending list for matches, resulting in transient failure to find matching entries during index lookups, as illustrated in bug #6307 from Maksym Boguk. Back-patch to 8.4 where the incorrect code was introduced.
2011-09-04Clean up the #include mess a little.Tom Lane
walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-07-04Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.hAlvaro Herrera
This lets us stop including rel.h into execnodes.h, which is a widely used header.
2011-06-09Pgindent run before 9.1 beta2.Bruce Momjian
2011-04-22Make GIN and GIST pass the index collation to all their support functions.Tom Lane
Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
2011-04-12Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane
Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-04-08Tweak collation setup for GIN index comparison functions.Tom Lane
Honor index column's collation spec if there is one, don't go to the expense of calling get_typcollation when we can reasonably assume that all GIN storage types will use default collation, and be sure to set a collation for the comparePartialFn too.
2011-03-26Clean up cruft around collation initialization for tupdescs and scankeys.Tom Lane
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.