user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2007-09-11	Rename recently-added pg_stat_activity column from txn_start to xact_start,	Tom Lane
	for consistency with other column names such as in pg_stat_database.
2007-09-08	Replace the former method of determining snapshot xmax --- to wit, calling	Tom Lane
	ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.
2007-09-07	Don't take ProcArrayLock while exiting a transaction that has no XID; there is	Tom Lane
	no need for serialization against snapshot-taking because the xact doesn't affect anyone else's snapshot anyway. Per discussion. Also, move various info about the interlocking of transactions and snapshots out of code comments and into a hopefully-more-cohesive discussion in access/transam/README. Also, remove a couple of now-obsolete comments about having to force some WAL to be written to persuade RecordTransactionCommit to do its thing.
2007-09-05	Quick hack to make the VXID of a prepared transaction be -1/XID,	Tom Lane
	so that different prepared xacts can be told apart in the pg_locks view. Per suggestion from Florian.
2007-09-05	Implement lazy XID allocation: transactions that do not modify any database	Tom Lane
	rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.
2007-09-03	Implement function-local GUC parameter settings, as per recent discussion.	Tom Lane
	There are still some loose ends: I didn't do anything about the SET FROM CURRENT idea yet, and it's not real clear whether we are happy with the interaction of SET LOCAL with function-local settings. The documentation is a bit spartan, too.
2007-08-28	Add a debug logging message when a resource manager rejects an attempted	Tom Lane
	restart point. Per suggestion from Simon Riggs.
2007-08-13	Fix two bugs induced in VACUUM FULL by async-commit patch.	Tom Lane
	First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be settable, because clog.c's inexact LSN bookkeeping results in windows where a previously flushed transaction is considered unhintable because it shares an LSN slot with a later unflushed transaction. But repair_frag requires XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the current vacuum. Since not being able to set the bit is an uncommon corner case, the most practical way of dealing with it seems to be to abandon shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose XMIN_COMMITTED bit couldn't be set. Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag examines the tuple it might be possible to set the bit. We therefore must take buffer content lock when calling HeapTupleSatisfiesVacuum a second time, else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This latter bug is latent in existing releases, but I think it cannot actually occur without async commit, since the first HeapTupleSatisfiesVacuum call should always have set the bit. So I'm not going to back-patch it. In passing, reduce the existing "cannot shrink relation" messages from NOTICE to LOG level. The new message must be no higher than LOG if we don't want unpredictable regression test failures, and consistency seems like a good idea. Also arrange that only one such message is reported per VACUUM FULL; in typical scenarios you could get spammed with many such messages, which seems a bit useless.
2007-08-04	Switch over to using the src/timezone functions for formatting timestamps	Tom Lane
	displayed in the postmaster log. This avoids Windows-specific problems with localized time zone names that are in the wrong encoding, and generally seems like a good idea to forestall other potential platform-dependent issues. To preserve the existing behavior that all backends will log in the same time zone, create a new GUC variable log_timezone that can only be changed on a system-wide basis, and reference log-related calculations to that zone instead of the TimeZone variable. This fixes the issue reported by Hiroshi Saito that timestamps printed by xlog.c startup could be improperly localized on Windows. We still need a simpler patch for that problem in the back branches, however.
2007-08-01	Support an optional asynchronous commit mode, in which we don't flush WAL	Tom Lane
	before reporting a transaction committed. Data consistency is still guaranteed (unlike setting fsync = off), but a crash may lose the effects of the last few transactions. Patch by Simon, some editorialization by Tom.
2007-07-24	Create a new dedicated Postgres process, "wal writer", which exists to write	Tom Lane
	and fsync WAL at convenient intervals. For the moment it just tries to offload this work from backends, but soon it will be responsible for guaranteeing a maximum delay before asynchronously-committed transactions will be flushed to disk. This is a portion of Simon Riggs' async-commit patch, committed to CVS separately because a background WAL writer seems like it might be a good idea independently of the async-commit feature. I rebased walwriter.c on bgwriter.c because it seemed like a more appropriate way of handling signals; while the startup/shutdown logic in postmaster.c is more like autovac because we want walwriter to quit before we start the shutdown checkpoint.
2007-06-30	Improve logging of checkpoints. Patch by Greg Smith, worked over	Tom Lane
	by Heikki and a little bit by me.
2007-06-28	Implement "distributed" checkpoints in which the checkpoint I/O is spread	Tom Lane
	over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.
2007-06-07	Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,	Tom Lane
	which is the only state in which it's safe to initiate database queries. It turns out that all but two of the callers thought that's what it meant; and the other two were using it as a proxy for "will GetTopTransactionId() return a nonzero XID"? Since it was in fact an unreliable guide to that, make those two just invoke GetTopTransactionId() always, then deal with a zero result if they get one.
2007-05-31	Make some messages more consistent	Peter Eisentraut

2007-05-31	Downgrade some low-level startup messages to DEBUG1.	Peter Eisentraut

2007-05-30	Fix overly-strict sanity check in BeginInternalSubTransaction that made it	Tom Lane
	fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.
2007-05-30	Make large sequential scans and VACUUMs work in a limited-size "ring" of	Tom Lane
	buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.
2007-05-27	Fix up pgstats counting of live and dead tuples to recognize that committed	Tom Lane
	and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.
2007-05-20	To support external compression of archived WAL data, add a flag bit to	Tom Lane
	WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.
2007-05-02	During WAL recovery, when reading a page that we intend to overwrite completely	Tom Lane
	from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas
2007-04-30	Change the timestamps recorded in transaction commit/abort xlog records	Tom Lane
	from time_t to TimestampTz representation. This provides full gettimeofday() resolution of the timestamps, which might be useful when attempting to do point-in-time recovery --- previously it was not possible to specify the stop point with sub-second resolution. But mostly this is to get rid of TimestampTz-to-time_t conversion overhead during commit. Per my proposal of a day or two back.
2007-04-30	Implement rate-limiting logic on how often backends will attempt to send	Tom Lane
	messages to the stats collector. This avoids the problem that enabling stats_row_level for autovacuum has a significant overhead for short read-only transactions, as noted by Arjen van der Meijden. We can avoid an extra gettimeofday call by piggybacking on the one done for WAL-logging xact commit or abort (although that doesn't help read-only transactions, since they don't WAL-log anything). In my proposal for this, I noted that we could change the WAL log entries for commit/abort to record full TimestampTz precision, instead of only time_t as at present. That's not done in this patch, but will be committed separately.
2007-04-26	Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan	Tom Lane
	is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
2007-04-03	Remove the CheckpointStartLock in favor of having backends show whether they	Tom Lane
	are in their commit critical sections via flags in the ProcArray. Checkpoint can watch the ProcArray to determine when it's safe to proceed. This is a considerably better solution to the original problem of race conditions between checkpoint and transaction commit: it speeds up commit, since there's one less lock to fool with, and it prevents the problem of checkpoint being delayed indefinitely when there's a constant flow of commits. Heikki, with some kibitzing from Tom.
2007-04-03	Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE.	Tom Lane
	Add the latter to the values checked in pg_control, since it can't be changed without invalidating toast table content. This commit in itself shouldn't change any behavior, but it lays some necessary groundwork for experimentation with these toast-control numbers. Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some thought still needs to be given to needs_toast_table() in toasting.c before unleashing random changes.
2007-03-22	Arrange for PreventTransactionChain to reject commands submitted as part	Tom Lane
	of a multi-statement simple-Query message. This bug goes all the way back, but unfortunately is not nearly so easy to fix in existing releases; it is only the recent ProcessUtility API change that makes it fixable in HEAD. Per report from William Garrison.
2007-03-13	Reverted waiting for further fixes:	Peter Eisentraut
	Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland
2007-03-13	First phase of plan-invalidation project: create a plan cache management	Tom Lane
	module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.
2007-03-12	Make configuration parameters fall back to their default values when they	Peter Eisentraut
	are removed from the configuration file. Joachim Wieland
2007-03-03	Remove undo information from pg_controldata --- never used.	Bruce Momjian
	Florian G. Pflug
2007-02-15	Restructure autovacuum in two processes: a dummy process, which runs	Alvaro Herrera
	continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.
2007-02-14	Move fsync method macro defines into /include/access/xlogdefs.h so they	Bruce Momjian
	can be used by src/tools/fsync/test_fsync.c.
2007-02-13	Disallow committing a prepared transaction unless we are in the same database	Tom Lane
	it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.
2007-02-09	Combine cmin and cmax fields of HeapTupleHeaders into a single field, by	Tom Lane
	keeping private state in each backend that has inserted and deleted the same tuple during its current top-level transaction. This is sufficient since there is no need to be able to determine the cmin/cmax from any other transaction. This gets us back down to 23-byte headers, removing a penalty paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with minor revisions by moi, following a design hashed out awhile back on the pghackers list.
2007-02-08	Normalize fgets() calls to use sizeof() for calculating the buffer size	Peter Eisentraut
	where possible, and fix some sites that apparently thought that fgets() will overwrite the buffer by one byte. Also add some strlcpy() to eliminate some weird memory handling.
2007-02-07	Add a function pg_stat_clear_snapshot() that discards any statistics snapshot	Tom Lane
	already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.
2007-02-07	Remove the xlog-centric "database system is ready" message and replace it with	Tom Lane
	"database system is ready to accept connections", which is issued by the postmaster when it really is ready to accept connections. Per proposal from Markus Schiltknecht and subsequent discussion.
2007-02-01	Wording cleanup for error messages. Also change can't -> cannot.	Bruce Momjian
	Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
2007-01-16	Arrange for autovacuum to be killed when another operation wants to be alone	Alvaro Herrera
	accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.
2007-01-05	Update CVS HEAD for 2007 copyright. Back branches are typically not	Bruce Momjian
	back-stamped for this.
2006-12-08	Remove the logId/logSeg fields from pg_control, because they are not needed	Tom Lane
	in normal operation, and we can avoid rewriting pg_control at every log segment switch if we don't insist that these values be valid. Reducing the number of pg_control updates is a good idea for both performance and reliability. It does make pg_resetxlog's life a bit harder, but that seems a good tradeoff; and anyway the change to pg_resetxlog amounts to automating something people formerly needed to do by hand, namely look at the existing pg_xlog files to make sure the new WAL start point was past them. In passing, change the wording of xlog.c's "database system was interrupted" messages: describe the pg_control timestamp as "last known up at" rather than implying it is the exact time of service interruption. With this change the timestamp will generally be the time of the last checkpoint, which could be many minutes before the failure; and we've already seen indications that people tend to misinterpret the old wording. initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane
2006-12-06	Add a txn_start column to pg_stat_activity. This makes it easier to	Neil Conway
	identify long-running transactions. Since we already need to record the transaction-start time (e.g. for now()), we don't need any additional system calls to report this information. Catversion bumped, initdb required.
2006-11-30	Minor adjustments to make failures in startup/shutdown behave more cleanly.	Tom Lane
	StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.
2006-11-23	Several changes to reduce the probability of running out of memory during	Tom Lane
	AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.
2006-11-21	On systems that have setsid(2) (which should be just about everything except	Tom Lane
	Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.
2006-11-17	Repair two related errors in heap_lock_tuple: it was failing to recognize	Tom Lane
	cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
2006-11-16	String fix	Peter Eisentraut

2006-11-10	Clean up some misleading references to %p being a full path, per Simon.	Tom Lane

2006-11-08	Change Windows rename and unlink substitutes so that they time out after	Tom Lane
	30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.