summaryrefslogtreecommitdiff
path: root/src/backend/access/gin/ginxlog.c
AgeCommit message (Collapse)Author
2019-03-24Fix WAL format incompatibility introduced by backpatching of 52ac6cd2d0Alexander Korotkov
52ac6cd2d0 added new field to ginxlogDeletePage and was backpatched to 9.4. That led to problems when patched postgres instance applies WAL records generated by non-patched one. WAL records generated by non-patched instance don't contain new field, which patched one is expecting to see. Thankfully, we can distinguish patched and non-patched WAL records by their data size. If we see that WAL record is generated by non-patched instance, we skip processing of new field. This commit comes with some assertions. In particular, if it appears that on some platform struct data size didn't change then static assertion will trigger. Reported-by: Simon Riggs Discussion: https://postgr.es/m/CANP8%2Bj%2BK4whxf7ET7%2BgO%2BG-baC3-WxqqH%3DnV4X2CgfEPA3Yu3g%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Simon Riggs, Alvaro Herrera Backpatch-through: 9.4
2018-12-13Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov
When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
2018-12-13Prevent deadlock in ginRedoDeletePage()Alexander Korotkov
On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
2018-09-09Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov
ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
2018-07-19Fix handling of empty uncompressed posting list pages in GINAlexander Korotkov
PostgreSQL 9.4 introduces posting list compression in GIN. This feature supports online upgrade, so that after pg_upgrade uncompressed posting lists are compressed on-the-fly. Underlying code appears to always expect at least one item on uncompressed posting list page. But there could be completely empty pages, because VACUUM never deletes leftmost and rightmost pages from posting trees. This commit fixes that. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com Author: Sivasubramanian Ramasubramanian, Alexander Korotkov Backpatch-through: 9.4
2015-06-30Initialize GIN metapage correctly when replaying metapage-update WAL record.Heikki Linnakangas
I broke this with my WAL format refactoring patch. Before that, the metapage was read from disk, and modified in-place regardless of the LSN. That was always a bit silly, as there's no need to read the old page version from disk disk when we're overwriting it anyway. So that was changed in 9.5, but I failed to add a GinInitPage call to initialize the page-headers correctly. Usually you wouldn't notice, because the metapage is already in the page cache and is not zeroed. One way to reproduce this is to perform a VACUUM on an already vacuumed table (so that the vacuum has no real work to do), immediately after a checkpoint, and then perform an immediate shutdown. After recovery, the page headers of the metapage will be incorrectly all-zeroes. Reported by Jeff Janes
2015-05-20Collection of typo fixes.Heikki Linnakangas
Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-11-20Revamp the WAL record format.Heikki Linnakangas
Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
2014-09-02Refactor per-page logic common to all redo routines to a new function.Heikki Linnakangas
Every redo routine uses the same idiom to determine what to do to a page: check if there's a backup block for it, and if not read, the buffer if the block exists, and check its LSN. Refactor that into a common function, XLogReadBufferForRedo, making all the redo routines shorter and more readable. This has no user-visible effect, and makes no changes to the WAL format. Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
2014-07-31Move log_newpage and log_newpage_buffer to xlog.c.Heikki Linnakangas
log_newpage is used by many indexams, in addition to heap, but for historical reasons it's always been part of the heapam rmgr. Starting with 9.3, we have another WAL record type for logging an image of a page, XLOG_FPI. Simplify things by moving log_newpage and log_newpage_buffer to xlog.c, and switch to using the XLOG_FPI record type. Bump the WAL version number because the code to replay the old HEAP_NEWPAGE records is removed.
2014-05-08Protect against torn pages when deleting GIN list pages.Heikki Linnakangas
To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-04-28Fix two bugs in WAL-logging of GIN pending-list pages.Heikki Linnakangas
In writeListPage, never take a full-page image of the page, because we have all the information required to re-initialize in the WAL record anyway. Before this fix, a full-page image was always generated, unless full_page_writes=off, because when the page is initialized its LSN is always 0. In stable-branches, keep the code to restore the backup blocks if they exist, in case that the WAL is generated with an older minor version, but in master Assert that there are no full-page images. In the redo routine, add missing "off++". Otherwise the tuples are added to the page in reverse order. That happens to be harmless because we always scan and remove all the tuples together, but it was clearly wrong. Also, it was masked by the first bug unless full_page_writes=off, because the page was always restored from a full-page image. Backpatch to all supported versions.
2014-04-14Set pd_lower on internal GIN posting tree pages.Heikki Linnakangas
This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
2014-04-07Fix WAL replay bug in the new GIN incomplete-split code.Heikki Linnakangas
Forgot to set the incomplete-split flag on the left page half, in redo of a page split. Spotted this by comparing the page contents on master and standby, after inserting/applying each WAL record.
2014-04-01Remove dead check for backup block, replace with Assert.Heikki Linnakangas
We don't use backup blocks with GIN vacuum records anymore, the page is always recreated from scratch.
2014-03-31Rewrite the way GIN posting lists are packed on a page, to reduce WAL volume.Heikki Linnakangas
Inserting (in retail) into the new 9.4 format GIN posting tree created much larger WAL records than in 9.3. The previous strategy to WAL logging was basically to log the whole page on each change, with the exception of completely unmodified segments up to the first modified one. That was not too bad when appending to the end of the page, as only the last segment had to be WAL-logged, but per Fujii Masao's testing, even that produced 2x the WAL volume that 9.3 did. The new strategy is to keep track of changes to the posting lists in a more fine-grained fashion, and also make the repacking" code smarter to avoid decoding and re-encoding segments unnecessarily.
2014-03-12In WAL replay, restore GIN metapage unconditionally to avoid torn page.Heikki Linnakangas
We don't take a full-page image of the GIN metapage; instead, the WAL record contains all the information required to reconstruct it from scratch. But to avoid torn page hazards, we must re-initialize it from the WAL record every time, even if it already has a greater LSN, similar to how normal full page images are restored. This was highly unlikely to cause any problems in practice, because the GIN metapage is small. We rely on an update smaller than a 512 byte disk sector to be atomic elsewhere, at least in pg_control. But better safe than sorry, and this would be easy to overlook if more fields are added to the metapage so that it's no longer small. Reported by Noah Misch. Backpatch to all supported versions.
2014-01-24Reset unused fields in GIN data leaf page footer.Heikki Linnakangas
The maxoff field is not used in the new, compressed page format. Let's reset it when converting an old-format page to the new format. The code won't care either way, but this makes it possible to use the field for something else in the future.
2014-01-22Compress GIN posting lists, for smaller index size.Heikki Linnakangas
GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-11-27Get rid of the post-recovery cleanup step of GIN page splits.Heikki Linnakangas
Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.
2013-11-27More GIN refactoring.Heikki Linnakangas
Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.
2013-11-20Refactor the internal GIN B-tree interface for forming a downlink.Heikki Linnakangas
This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.
2013-10-03Minor GIN code refactoring.Heikki Linnakangas
It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.
2013-03-18Remove PageSetTLI and rename pd_tli to pd_checksumSimon Riggs
Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-12-28Remove obsolete XLogRecPtr macrosAlvaro Herrera
This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
2012-11-28Split out rmgr rm_desc functions into their own filesAlvaro Herrera
This is necessary (but not sufficient) to have them compilable outside of a backend environment.
2012-11-12Fix multiple problems in WAL replay.Tom Lane
Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
2012-04-06Fix misleading output from gin_desc().Tom Lane
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.
2012-02-26Fix some more bugs in GIN's WAL replay logic.Tom Lane
In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.
2012-02-04Improve comment.Tom Lane
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-11-25Fix erroneous replay of GIN_UPDATE_META_PAGE WAL records.Tom Lane
A simple thinko in ginRedoUpdateMetapage, namely failing to increment a loop counter, led to inserting records into the last pending-list page in the wrong order (the opposite of that intended). So far as I can tell, this would not upset the code that eventually flushes pending items into the main part of the GIN index. But it did break the code that searched the pending list for matches, resulting in transient failure to find matching entries during index lookups, as illustrated in bug #6307 from Maksym Boguk. Back-patch to 8.4 where the incorrect code was introduced.
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-01-07Fix GIN to support null keys, empty and null items, and full index scans.Tom Lane
Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-10-17Fix a passel of inappropriately-named global functions in GIN.Tom Lane
The GIN code has absolutely no business exporting GIN-specific functions with names as generic as compareItemPointers() or newScanKey(); that's just trouble waiting to happen. I got annoyed about this again just now and decided to fix it. This commit ensures that all global symbols defined in access/gin/ have names including "gin" or "Gin". There were a couple of cases, like names involving "PostingItem", where arguably the names were already sufficiently nongeneric; but I figured as long as I was risking creating merge problems for unapplied GIN patches I might as well impose a uniform policy. I didn't touch any static symbol names. There might be some places where it'd be appropriate to rename some static functions to match siblings that are exported, but I'll leave that for another time.
2010-10-17Improve GIN indexscan cost estimation.Tom Lane
The better estimate requires more statistics than we previously stored: in particular, counts of "entry" versus "data" pages within the index, as well as knowledge of the number of distinct key values. We collect this information during initial index build and update it during VACUUM, storing the info in new fields on the index metapage. No initdb is required because these fields will read as zeroes in a pre-existing index, and the new gincostestimate code is coded to behave (reasonably) sanely if they are zeroes. Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
2010-10-11Fix assorted bugs in GIN's WAL replay logic.Tom Lane
The original coding was quite sloppy about handling the case where XLogReadBuffer fails (because the page has since been deleted). This would result in either "bad buffer id: 0" or an Assert failure during replay, if indeed the page were no longer there. In a couple of places it also neglected to check whether the change had already been applied, which would probably result in corrupted index contents. I believe that bug #5703 is an instance of the first problem. These issues could show up without replication, but only if you were unfortunate enough to crash between modification of a GIN index and the next checkpoint. Back-patch to 8.2, which is as far back as GIN has WAL support.
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-02-09Fix bug in GIN WAL redo cleanup function: don't free fake relcache entryHeikki Linnakangas
while it's still being used. Backpatch to 8.4, where the fake relcache method was introduced.
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-12-19Allow read only connections during recovery, known as Hot Standby.Simon Riggs
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-03-24Implement "fastupdate" support for GIN indexes, in which we try to accumulateTom Lane
multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
2009-01-20Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas
be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.