summaryrefslogtreecommitdiff
path: root/src/include/access
AgeCommit message (Collapse)Author
2008-10-22Fix GiST's killing tuple: GISTScanOpaque->curpos wasn'tTeodor Sigaev
correctly set. As result, killtuple() marks as dead wrong tuple on page. Bug was introduced by me while fixing possible duplicates during GiST index scan.
2008-08-23Fix possible duplicate tuples while GiST scan. Now page is processedTeodor Sigaev
at once and ItemPointers are collected in memory. Remove tuple's killing by killtuple() if tuple was moved to another page - it could produce unaceptable overhead. Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
2008-04-22Fix using too many LWLocks bug, reported by Craig RingerTeodor Sigaev
<craig@postnewspapers.com.au>. It was my mistake, I missed limitation of number of held locks, now GIN doesn't use continiuous locks, but still hold buffers pinned to prevent interference with vacuum's deletion algorithm.
2008-04-17Repair two places where SIGTERM exit could leave shared memory stateTom Lane
corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
2008-04-05Defend against JOINs having more than 32K columns altogether. We cannotTom Lane
currently support this because we must be able to build Vars referencing join columns, and varattno is only 16 bits wide. Perhaps this should be improved in future, but considering that it never came up before, I'm not sure the problem is worth much effort. Per bug #4070 from Marcello Ceschia. The problem seems largely academic in 8.0 and 7.4, because they have (different) O(N^2) performance issues with such wide joins, but back-patch all the way anyway.
2008-03-04Fix PREPARE TRANSACTION to reject the case where the transaction has dropped aTom Lane
temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.
2007-11-16GIN index build's allocatedMemory counter needs to be long, not uint32.Tom Lane
Else, in a 64-bit machine with maintenance_work_mem set to above 4Gb, the counter overflows and we never recognize having reached the maintenance_work_mem limit. I believe this explains out-of-memory failure recently reported by Sean Davis. This is a bug, so backpatch to 8.2.
2007-06-01Fix performance problems in multi-batch hash joins by ensuring that we selectTom Lane
a well-randomized batch number even when given a poorly-randomized hash value. This is a bit inefficient but seems the only practical solution given the constraint that we can't change the hash functions in released branches. Per report from Joseph Shraibman. Applied to 8.1 and 8.2 only --- HEAD is getting a cleaner fix, and 8.0 and before use different coding that seems less vulnerable.
2007-04-19Repair PANIC condition in hash indexes when a previous index extension attemptTom Lane
failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.
2007-02-04Don't MAXALIGN in the checks to decide whether a tuple is over TOAST'sTom Lane
threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)
2006-11-17Repair two related errors in heap_lock_tuple: it was failing to recognizeTom Lane
cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
2006-11-05Fix recently-understood problems with handling of XID freezing, particularlyTom Lane
in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-01Fix "failed to re-find parent key" btree VACUUM failure by revising pageTom Lane
deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.
2006-10-05Make use of qsort_arg in several places that were formerly using klugyTom Lane
static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-09-10If we're going to advertise the array overlap/containment operators,Tom Lane
we probably should make them work reliably for all arrays. Fix code to handle NULLs and multidimensional arrays, move it into arrayfuncs.c. GIN is still restricted to indexing arrays with no null elements, however.
2006-09-10Rename contains/contained-by operators to @> and <@, per discussion thatTom Lane
agreed these symbols are less easily confused. I made new pg_operator entries (with new OIDs) for the old names, so as to provide backward compatibility while making it pretty easy to remove the old names in some future release cycle. This commit only touches the core datatypes, contrib will be fixed separately.
2006-08-24Optimize the case where a btree indexscan has current and mark positionsTom Lane
on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas
2006-08-21Make the server track an 'XID epoch', that is, maintain higher-order bitsTom Lane
of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.
2006-08-18Now that we've rearranged relation open to get a lock before touchingTom Lane
the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.
2006-08-17Implement archive_timeout feature to force xlog file switches to occur no moreTom Lane
than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs
2006-08-07Make recovery from WAL be restartable, by executing a checkpoint-likeTom Lane
operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.
2006-08-06Add support for forcing a switch to a new xlog file; cause such a switchTom Lane
to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.
2006-07-31Change the relation_open protocol so that we obtain lock on a relationTom Lane
(table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.
2006-07-25Modify btree to delete known-dead index entries without an actual VACUUM.Tom Lane
When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto
2006-07-13More include file adjustments.Bruce Momjian
2006-07-13More include file adjustments.Bruce Momjian
2006-07-13Allow include files to compile own their own.Bruce Momjian
Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.
2006-07-11Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so thatTom Lane
it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.
2006-07-11Alphabetically order reference to include files, "S"-"Z".Bruce Momjian
2006-07-11Alphabetically order reference to include files, "G" - "M".Bruce Momjian
2006-07-11GIN improvementsTeodor Sigaev
- Replace sorted array of entries in maintenance_work_mem to binary tree, this should improve create performance. - More precisely calculate allocated memory, eliminate leaks with user-defined extractValue() - Improve wordings in tsearch2
2006-07-11Allow each C include file to compile on its own by including any neededBruce Momjian
header files.
2006-07-10Improve vacuum code to track minimum Xids per table instead of per database.Alvaro Herrera
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.
2006-07-03Code review for FILLFACTOR patch. Change WITH grammar as per earlierTom Lane
discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.
2006-07-02Add FILLFACTOR to CREATE INDEX.Bruce Momjian
ITAGAKI Takahiro
2006-06-28ChangesTeodor Sigaev
* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l|r)datum for splitting. pickSplit should set spl_(l|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules
2006-06-27Create infrastructure for 'MinimalTuple' representation of in-memoryTom Lane
tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.
2006-06-25Fix GEVHDRSZ for Win32.Bruce Momjian
Magnus Hagander
2006-06-16Fix problems with cached tuple descriptors disappearing while still in useTom Lane
by creating a reference-count mechanism, similar to what we did a long time ago for catcache entries. The back branches have an ugly solution involving lots of extra copies, but this way is more efficient. Reference counting is only applied to tupdescs that are actually in caches --- there seems no need to use it for tupdescs that are generated in the executor, since they'll go away during plan shutdown by virtue of being in the per-query memory context. Neil Conway and Tom Lane
2006-05-29Som improve page split in multicolumn GiST index.Teodor Sigaev
If user picksplit on n-th column generate equals left and right unions then it calls picksplit on n+1-th column.
2006-05-24* Add support NULL to GiST.Teodor Sigaev
* some refactoring and simplify code int gistutil.c and gist.c * now in some cases it can be called used-defined picksplit method for non-first column in index, but here is a place to do more. * small fix of docs related to support NULL.
2006-05-19Simplify gistSplit() and some refactoring related code.Teodor Sigaev
2006-05-17Reduce size of critial section during vacuum full, criticalTeodor Sigaev
sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay
2006-05-10Clean up code associated with updating pg_class statistics columnsTom Lane
(relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.
2006-05-10Reduce size of critical section and remove call of user-defined functions inTeodor Sigaev
insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG
2006-05-08Rewrite btree vacuuming to fold the former bulkdelete and cleanup operationsTom Lane
into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.
2006-05-07Rewrite btree index scans to work a page at a time in all cases (bothTom Lane
btgettuple and btgetmulti). This eliminates the problem of "re-finding" the exact stopping point, since the stopping point is effectively always a page boundary, and index items are never moved across pre-existing page boundaries. A small penalty is that the keys_are_unique optimization is effectively disabled (and, therefore, is removed in this patch), causing us to apply _bt_checkkeys() to at least one more tuple than necessary when looking up a unique key. However, the advantages for non-unique cases seem great enough to accept this tradeoff. Aside from simplifying and (sometimes) speeding up the indexscan code, this will allow us to reimplement btbulkdelete as a largely sequential scan instead of index-order traversal, thereby significantly reducing the cost of VACUUM. Those changes will come in a separate patch. Original patch by Heikki Linnakangas, rework by Tom Lane.
2006-05-02Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion.Tom Lane
This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().
2006-05-02GIN: Generalized Inverted iNdex.Teodor Sigaev
text[], int4[], Tsearch2 support for GIN.