summaryrefslogtreecommitdiff
path: root/src/backend/access/gist
AgeCommit message (Collapse)Author
2013-02-10Further cleanup of gistsplit.c.Tom Lane
After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
2013-02-10Remove useless picksplit-doesn't-support-secondary-split log spam.Tom Lane
This LOG message was put in over five years ago with the evident expectation that we'd make all GiST opclasses support secondary split directly. However, no such thing ever happened, and indeed the number of opclasses supporting it decreased to zero in 9.2. The reason is that improving on the default implementation isn't that easy --- the opclass-specific code that did exist, before 9.2, doesn't appear to have been any improvement over the default. Hence, remove the message altogether. There's certainly no point in nagging users about this in released branches, but I doubt that we'll ever implement complete opclass-specific support anyway.
2013-02-10Document and clean up gistsplit.c.Tom Lane
Improve comments, rename some variables and functions, slightly simplify a couple of APIs, in an attempt to make this code readable by people other than its original author. Even though this is essentially just cosmetic, back-patch to all active branches, because otherwise it's going to make back-patching future fixes in this file very painful.
2013-02-08Fix gist_box_same and gist_point_consistent to handle fuzziness correctly.Tom Lane
While there's considerable doubt that we want fuzzy behavior in the geometric operators at all (let alone as currently implemented), nobody is stepping forward to redesign that stuff. In the meantime it behooves us to make sure that index searches agree with the behavior of the underlying operators. This patch fixes two problems in this area. First, gist_box_same was using fuzzy equality, but it really needs to use exact equality to prevent not-quite-identical upper index keys from being treated as identical, which for example would prevent an existing upper key from being extended by an amount less than epsilon. This would result in inconsistent indexes. (The next release notes will need to recommend that users reindex GiST indexes on boxes, polygons, circles, and points, since all four opclasses use gist_box_same.) Second, gist_point_consistent used exact comparisons for upper-page comparisons in ~= searches, when it needs to use fuzzy comparisons to ensure it finds all matches; and it used fuzzy comparisons for point <@ box searches, when it needs to use exact comparisons because that's what the <@ operator (rather inconsistently) does. The added regression test cases illustrate all three misbehaviors. Back-patch to all active branches. (8.4 did not have GiST point_ops, but it still seems prudent to apply the gist_box_same patch to it.) Alexander Korotkov, reviewed by Noah Misch
2013-02-07Repair bugs in GiST page splitting code for multi-column indexes.Tom Lane
When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2012-11-12Fix multiple problems in WAL replay.Tom Lane
Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
2012-08-30Back-patch recent fixes for gistchoose and gistRelocateBuildBuffersOnSplit.Tom Lane
This back-ports commits c8ba697a4bdb934f0c51424c654e8db6133ea255 and e5db11c5582b469c04a11f217a0f32c827da5dd7, which fix one definite and one speculative bug in gistchoose, and make the code a lot more intelligible as well. In 9.2 only, this also affects the largely-copied-and-pasted logic in gistRelocateBuildBuffersOnSplit. The impact of the bugs was that the functions might make poor decisions as to which index tree branch to push a new entry down into, resulting in GiST index bloat and poor performance. The fixes rectify these decisions for future insertions, but a REINDEX would be needed to clean up any existing index bloat. Alexander Korotkov, Robert Haas, Tom Lane
2011-09-16gistendscan() forgot to free so->giststate.Tom Lane
This oversight led to a massive memory leak --- upwards of 10KB per tuple --- during creation-time verification of an exclusion constraint based on a GIST index. In most other scenarios it'd just be a leak of 10KB that would be recovered at end of query, so not too significant; though perhaps the leak would be noticeable in a situation where a GIST index was being used in a nestloop inner indexscan. In any case, it's a real leak of long standing, so patch all supported branches. Per report from Harald Fuchs.
2011-07-15Fix two ancient bugs in GiST code to re-find a parent after page split:Heikki Linnakangas
First, when following a right-link, we incorrectly marked the current page as the parent of the right sibling. In reality, the parent of the right page is the same as the parent of the current page (or some page to the right of it, gistFindCorrectParent() will sort that out). Secondly, when we follow a right-link, we must prepend, not append, the right page to our list of pages to visit. That's because we assume that once we hit a leaf page in the list, all the rest are leaf pages too, and give up. To hit these bugs, you need concurrent actions and several unlucky accidents. Another backend must split the root page, while you're in process of splitting a lower-level page. Furthermore, while you scan the internal nodes to re-find the parent, another backend needs to again split some more internal pages. Even then, the bugs don't necessarily manifest as user-visible errors or index corruption. While we're at it, make the error reporting a bit better if gistFindPath() fails to re-find the parent. It used to be an assertion, but an elog() seems more appropriate. Backpatch to all supported branches.
2011-05-31Protect GIST logic that assumes penalty values can't be negative.Tom Lane
Apparently sane-looking penalty code might return small negative values, for example because of roundoff error. This will confuse places like gistchoose(). Prevent problems by clamping negative penalty values to zero. (Just to be really sure, I also made it force NaNs to zero.) Back-patch to all supported branches. Alexander Korotkov
2010-11-16The GiST scan algorithm uses LSNs to detect concurrent pages splits, butHeikki Linnakangas
temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.
2010-04-14Typo fix. Kevin Grittner.Robert Haas
2010-02-26pgindent run for 9.0Bruce Momjian
2010-02-08Remove some more dead VACUUM-FULL-only code.Tom Lane
2010-02-08Remove old-style VACUUM FULL (which was known for a little while asTom Lane
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
2010-01-14Add point_ops opclass for GiST.Teodor Sigaev
2010-01-02Update copyright for the year 2010.Bruce Momjian
2010-01-01Support "x IS NOT NULL" clauses as indexscan conditions. This turns outTom Lane
to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.
2009-12-24Fix wrong WAL info value generated when gistContinueInsert() performs anTom Lane
index page split. This would result in index corruption, or even more likely an error during WAL replay, if we were unlucky enough to crash during end-of-recovery cleanup after having completed an incomplete GIST insertion. Yoichi Hirai
2009-12-19Allow read only connections during recovery, known as Hot Standby.Simon Riggs
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-10-08Remove very ancient tuple-counting infrastructure (IncrRetrieved() andTom Lane
friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.
2009-09-18Fix incorrect arguments for gist_box_penalty call. The bug could be observedTeodor Sigaev
only for secondary page split (i.e. for non-first columns of index) Patch by Paul Ramsey <pramsey@opengeo.org>
2009-07-29Support deferrable uniqueness constraints.Tom Lane
The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed
2009-06-24Correct grammar in picksplit debug messagesPeter Eisentraut
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-06-10Improve capitalization and punctuation in recently added GiST message.Peter Eisentraut
2009-06-06Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat saneTom Lane
behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.
2009-04-06Fix 'all at one page bug' in picksplit method of R-tree emulation. Add defenseTeodor Sigaev
from buggy user-defined picksplit to GiST.
2009-03-24Implement "fastupdate" support for GIN indexes, in which we try to accumulateTom Lane
multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
2009-01-20Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas
be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
2009-01-05Change the reloptions machinery to use a table-based parser, and provideAlvaro Herrera
a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.
2009-01-01Update copyright for 2009.Bruce Momjian
2008-12-04Initialize GISTScanOpaque->qual_ok even if there is no conditions.Teodor Sigaev
2008-11-19Rethink the way FSM truncation works. Instead of WAL-logging FSMHeikki Linnakangas
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.
2008-11-13Prevent synchronous scan during GIN index build, because GIN is optimizedTom Lane
for inserting tuples in increasing TID order. It's not clear whether this fully explains Ivan Sergio Borgonovo's complaint, but simple testing confirms that a scan that doesn't start at block 0 can slow GIN build by a factor of three or four. Backpatch to 8.3. Sync scan didn't exist before that.
2008-11-03Clean up the messy semantics (not to mention inefficiency) of PageGetTempPageTom Lane
by splitting it into three functions with better-defined behaviors. Zdenek Kotala
2008-10-31Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBufferHeikki Linnakangas
functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-22Fix GiST's killing tuple: GISTScanOpaque->curpos wasn'tTeodor Sigaev
correctly set. As result, killtuple() marks as dead wrong tuple on page. Bug was introduced by me while fixing possible duplicates during GiST index scan.
2008-10-20Remove support of backward scan in GiST. Per discussionTeodor Sigaev
http://archives.postgresql.org/pgsql-hackers/2008-10/msg00857.php
2008-10-20Remove mark/restore support in GIN and GiST indexes.Teodor Sigaev
Per Tom's comment. Also revome useless GISTScanOpaque->flags field.
2008-10-17During repeated rescan of GiST index it's possible that scan keyTeodor Sigaev
is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys to set during key initialization. If key is NULL and SK_SEARCHNULL is not set then nothnig can be satisfied. With assert-enabled compilation that causes coredump. Bug was introduced in 8.3 by support of IS NULL index scan.
2008-10-06Index FSMs needs to be vacuumed as well. Report by Jeff Davis.Heikki Linnakangas
2008-09-30Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, theHeikki Linnakangas
free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.
2008-08-23Fix possible duplicate tuples while GiST scan. Now page is processedTeodor Sigaev
at once and ItemPointers are collected in memory. Remove tuple's killing by killtuple() if tuple was moved to another page - it could produce unaceptable overhead. Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
2008-07-13Clean up the use of some page-header-access macros: principally, useTom Lane
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-06-19Improve our #include situation by moving pointer types away from theAlvaro Herrera
corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
2008-06-15Fix 64-bit problem in recent patch.Tom Lane
2008-06-12Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relationHeikki Linnakangas
forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
2008-05-12Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera
unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
2008-04-14Push index operator lossiness determination down to GIST/GIN opclassTom Lane
"consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.