summaryrefslogtreecommitdiff
path: root/src/backend/access/transam
AgeCommit message (Collapse)Author
2008-05-19Remove arbitrary 10MB limit on two-phase state file size. It's not that hardHeikki Linnakangas
to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a schema with ~25000 objects. The really bogus thing about the limit was that it was enforced when a state file file was read in, not when it was written, so you would end up with a prepared transaction that you can't commit or abort, and the only recourse was to shut down the server and remove the file by hand. Raise the limit to MaxAllocSize, and enforce it also when a state file is written. We could've removed the limit altogether, but reading in a file larger than MaxAllocSize would fail anyway because we read it into a palloc'd buffer. Backpatch down to 8.1, where 2PC and this issue was introduced.
2008-05-13Don't try to close negative file descriptors, since this can causeMagnus Hagander
crashes on certain platforms. In particular, the MSVC runtime is known to do this. Fixes bug #4162, reported and diagnosed by Javier Pimas
2008-04-17Repair two places where SIGTERM exit could leave shared memory stateTom Lane
corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
2008-03-04Fix PREPARE TRANSACTION to reject the case where the transaction has dropped aTom Lane
temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.
2008-01-03Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,Tom Lane
and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
2007-09-29Make archive recovery always start a new timeline, rather than only when aTom Lane
recovery stop time was used. This avoids a corner-case risk of trying to overwrite an existing archived copy of the last WAL segment, and seems simpler and cleaner all around than the original definition. Per example from Jon Colverson and subsequent analysis by Simon.
2007-08-04Suppress time zone name (%Z) when logging timestamps in xlog.c startupTom Lane
on Windows. This is yet another manifestation of the problem that Windows returns time zone names that may be in a different encoding than we are using. I've put a better solution in HEAD, but the back branches need a simple patch. Per report from Hiroshi Saito.
2007-05-30Fix overly-strict sanity check in BeginInternalSubTransaction that made itTom Lane
fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.
2007-04-26Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scanTom Lane
is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
2007-02-13Disallow committing a prepared transaction unless we are in the same databaseTom Lane
it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.
2006-11-30Minor adjustments to make failures in startup/shutdown behave more cleanly.Tom Lane
StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.
2006-11-23Several changes to reduce the probability of running out of memory duringTom Lane
AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.
2006-11-21On systems that have setsid(2) (which should be just about everything exceptTom Lane
Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.
2006-11-17Repair two related errors in heap_lock_tuple: it was failing to recognizeTom Lane
cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
2006-11-16String fixPeter Eisentraut
2006-11-10Clean up some misleading references to %p being a full path, per Simon.Tom Lane
2006-11-08Change Windows rename and unlink substitutes so that they time out afterTom Lane
30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.
2006-11-05Fix recently-understood problems with handling of XID freezing, particularlyTom Lane
in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-10-18Add some code to CREATE DATABASE to check for pre-existing subdirectoriesTom Lane
that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.
2006-10-06Message style improvementsPeter Eisentraut
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-10-03Make some sentences consistent with similar ones.Bruce Momjian
Euler Taveira de Oliveira
2006-09-26Degrade the transaction-id wraparound point message from LOG to DEBUG1, perAlvaro Herrera
discussion. Patch from Simon Riggs.
2006-09-03Arrange for GetSnapshotData to copy live-subtransaction XIDs from theTom Lane
PGPROC array into snapshots, and use this information to avoid visits to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve the pg_subtrans-related context swap storm problem that's been reported by several people for 8.1. While at it, modify GetSnapshotData to not take an exclusive lock on ProcArrayLock, as closer analysis shows that shared lock is always sufficient. Itagaki Takahiro and Tom Lane
2006-08-27Move xact.c's partial support for Lists of TransactionIds into pg_list.h.Tom Lane
Needed because lock.c is now going to use the same type of list.
2006-08-21Make the server track an 'XID epoch', that is, maintain higher-order bitsTom Lane
of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.
2006-08-17Implement archive_timeout feature to force xlog file switches to occur no moreTom Lane
than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs
2006-08-07Make recovery from WAL be restartable, by executing a checkpoint-likeTom Lane
operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.
2006-08-06Add support for forcing a switch to a new xlog file; cause such a switchTom Lane
to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.
2006-07-30Modify snapshot definition so that lazy vacuums are ignored by otherAlvaro Herrera
vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.
2006-07-24DTrace support, with a small initial set of probesPeter Eisentraut
by Robert Lor
2006-07-20Don't try to truncate multixact SLRU files in checkpoints done during xlogTom Lane
recovery. In the first place, it doesn't work because slru's latest_page_number isn't set up yet (this is why we've been hearing reports of strange "apparent wraparound" log messages during crash recovery, but only from people who'd managed to advance their next-mxact counters some considerable distance from 0). In the second place, it seems a bit unwise to be throwing away data during crash recovery anwyway. This latter consideration convinces me to just disable truncation during recovery, rather than computing latest_page_number and pushing ahead.
2006-07-14Remove 576 references of include files that were not needed.Bruce Momjian
2006-07-13Allow include files to compile own their own.Bruce Momjian
Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.
2006-07-11Alphabetically order reference to include files, "N" - "S".Bruce Momjian
2006-07-11Alphabetically order reference to include files, "G" - "M".Bruce Momjian
2006-07-10Improve vacuum code to track minimum Xids per table instead of per database.Alvaro Herrera
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.
2006-07-03Code review for FILLFACTOR patch. Change WITH grammar as per earlierTom Lane
discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.
2006-07-02Add FILLFACTOR to CREATE INDEX.Bruce Momjian
ITAGAKI Takahiro
2006-06-27Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrectTom Lane
this someday, but right now it seems that posix_fadvise is immature to the point of being broken on many platforms ... and we don't have any benchmark evidence proving it's worth spending time on.
2006-06-22pg_stop_backup was calling XLogArchiveNotify() twice for the newly createdTom Lane
backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.
2006-06-20Remove redundant gettimeofday() calls to the extent practical withoutTom Lane
changing semantics too much. statement_timestamp is now set immediately upon receipt of a client command message, and the various places that used to do their own gettimeofday() calls to mark command startup are referenced to that instead. I have also made stats_command_string use that same value for pg_stat_activity.query_start for both the command itself and its eventual replacement by <IDLE> or <idle in transaction>. There was some debate about that, but no argument that seemed convincing enough to justify an extra gettimeofday() call.
2006-06-18Don't try to call posix_fadvise() unless <fcntl.h> supplies a declarationTom Lane
for it. Hopefully will fix core dump evidenced by some buildfarm members since fadvise patch went in. The actual definition of the function is not ABI-compatible with compiler's default assumption in the absence of any declaration, so it's clearly unsafe to try to call it without seeing a declaration.
2006-06-16Test for POSIX_FADV_DONTNEED to use posix_fadvise().Bruce Momjian
2006-06-15Use posix_fadvise() to avoid kernel caching of WAL contents on WAL fileBruce Momjian
close. ITAGAKI Takahiro
2006-05-02GIN: Generalized Inverted iNdex.Teodor Sigaev
text[], int4[], Tsearch2 support for GIN.
2006-04-25Add statement_timestamp(), clock_timestamp(), andBruce Momjian
transaction_timestamp() (just like now()). Also update statement_timeout() to mention it is statement arrival time that is measured. Catalog version updated.
2006-04-20Ensure that we validate the page header of the first page of a WAL fileTom Lane
whenever we start to read within that file. The first page carries extra identification information that really ought to be checked, but as the code stood, this was only checked when we switched sequentially into a new WAL file, or if by chance the starting checkpoint record was within the first page. This patch ensures that we will detect bogus 'long header' information before we start replaying the WAL sequence.
2006-04-17Fix the torn-page hazard for PITR base backups by forcing full page writesTom Lane
to occur between pg_start_backup() and pg_stop_backup(), even if the GUC setting full_page_writes is OFF. Per discussion, doing this in combination with the already-existing checkpoint during pg_start_backup() should ensure safety against partial page updates being included in the backup. We do not have to force full page writes to occur during normal PITR operation, as I had first feared.
2006-04-14Make the world safe for full_page_writes. Allow XLOG records that try toTom Lane
update no-longer-existing pages to fall through as no-ops, but make a note of each page number referenced by such records. If we don't see a later XLOG entry dropping the table or truncating away the page, complain at the end of XLOG replay. Since this fixes the known failure mode for full_page_writes = off, revert my previous band-aid patch that disabled that GUC variable.