user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2011-01-09	Split pg_start_backup() and pg_stop_backup() into two pieces	Magnus Hagander
	Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.
2011-01-07	Improve recovery.conf.sample comments.	Robert Haas
	Jehan-Guillaume de Rorthais, with some additional wordsmithing by me.
2011-01-03	Update comments in RecordTransactionCommit() to mention unlogged tables.	Robert Haas

2011-01-01	Stamp copyrights for year 2011.	Bruce Momjian

2010-12-30	Avoid unnecessary public struct declaration in slru.h	Alvaro Herrera
	Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner
2010-12-29	Support unlogged tables.	Robert Haas
	The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
2010-12-29	Add REPLICATION privilege for ROLEs	Magnus Hagander
	This privilege is required to do Streaming Replication, instead of superuser, making it possible to set up a SR slave that doesn't have write permissions on the master. Superuser privileges do NOT override this check, so in order to use the default superuser account for replication it must be explicitly granted the REPLICATION permissions. This is backwards incompatible change, in the interest of higher default security.
2010-12-24	Remove quotes from boolean recovery.conf.sample parameters, now that the	Bruce Momjian
	quotes are not required. This now matches postgresql.conf's specification of booleans.
2010-12-23	Rewrite the GiST insertion logic so that we don't need the post-recovery	Heikki Linnakangas
	cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.
2010-12-20	Allow transactions that don't write WAL to commit asynchronously.	Robert Haas
	This case can arise if a transaction has written data, but only to temporary tables. Loss of the commit record in case of a crash won't matter, because the temporary tables will be lost anyway. Reviewed by Heikki Linnakangas and Simon Riggs.
2010-12-14	Instrument checkpoint sync calls.	Robert Haas
	Greg Smith, reviewed by Jeff Janes
2010-12-10	Use symbolic names not octal constants for file permission flags.	Tom Lane
	Purely cosmetic patch to make our coding standards more consistent --- we were doing symbolic some places and octal other places. This patch fixes all C-coded uses of mkdir, chmod, and umask. There might be some other calls I missed. Inconsistency noted while researching tablespace directory permissions issue.
2010-12-08	Optimize commit_siblings in two ways to improve group commit.	Simon Riggs
	First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith
2010-12-07	Fix bugs in the hot standby known-assigned-xids tracking logic. If there's	Heikki Linnakangas
	an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.
2010-12-06	Fix two typos, by Fujii Masao.	Heikki Linnakangas

2010-12-03	Remove now-outdated mention of quotes being required in recovery.conf.	Robert Haas
	Noted by Itagaki Takahiro.
2010-12-03	Use GUC lexer for recovery.conf parsing.	Robert Haas
	This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.
2010-11-23	Remove useless whitespace at end of lines	Peter Eisentraut

2010-11-11	Fix bug introduced by the recent patch to check that the checkpoint redo	Heikki Linnakangas
	location read from backup label file can be found: wasShutdown was set incorrectly when a backup label file was found. Jeff Davis, with a little tweaking by me.
2010-11-09	Add monitoring function pg_last_xact_replay_timestamp.	Robert Haas
	Fujii Masao, with a little wordsmithing by me.
2010-11-02	Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000000000000001)	Heikki Linnakangas
	rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a more future-proof fix for the corner-case bug in streaming replication that was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in February as well. Avoiding 0/0 as a valid value should prevent bugs like that in the future. Per Tom Lane's idea. Back-patch to 9.0. Since this only affects bootstrapping, it makes no difference to existing installations. We don't need to worry about the bug in existing installations, because if you've managed to get past the initial base backup already, you won't hit the bug in the future either.
2010-11-01	Fix corner-case bug in tracking of latest removed WAL segment during	Heikki Linnakangas
	streaming replication. We used log/seg 0/0 to indicate that no WAL segments have been removed since startup, but 0/0 is a valid value for the very first WAL segment after initdb. To make that disambiguous, store (latest removed WAL segment + 1) in the global variable. Per report from Matt Chesler, also reproduced by Greg Smith.
2010-10-26	Before removing backup_label and irrevocably changing pg_control file, check	Heikki Linnakangas
	that WAL file containing the checkpoint redo-location can be found. This avoids making the cluster irrecoverable if the redo location is in an earlie WAL file than the checkpoint record. Report, analysis and patch by Jeff Davis, with small changes by me.
2010-10-20	Don't try to fetch database name when SetTransactionIdLimit() is executed	Tom Lane
	outside a transaction. This repairs brain fade in my patch of 2009-08-30: the reason we had been storing oldest-database name, not OID, in ShmemVariableCache was of course to avoid having to do a catalog lookup at times when it might be unsafe. This error explains why Aleksandr Dushein is having trouble getting out of an XID wraparound state in bug #5718, though not how he got into that state in the first place. I suspect pg_upgrade is at fault there.
2010-10-20	Remove AtStart_Cache() call in CommandCounterIncrement().	Alvaro Herrera
	This call was present in the aboriginal code from Berkeley, and has never been touched; it may very well be that it was there to mask effects of bugs in other places and it may no longer be necessary. The removal has been foreseen in a code comment since 2007; this seems to be a good time to test this hypothesis.
2010-10-14	Make startup process respond to signals to cancel waiting on latch.	Simon Riggs
	A tidy up for recently committed changes to startup latch. Fujii Masao
2010-10-14	Fix bug in comment of timeline history file.	Simon Riggs
	Fujii Masao
2010-09-20	Remove cvs keywords from all files.	Magnus Hagander

2010-09-17	Add some documentation about how we WAL-log filesystem actions.	Tom Lane
	Per a question from Robert Haas.
2010-09-15	Fix two typos in comments, spotted by Fujii Masao and Thom Brown	Heikki Linnakangas

2010-09-15	Use a latch to make startup process wake up and replay immediately when	Heikki Linnakangas
	new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.
2010-09-11	Introduce latches. A latch is a boolean variable, with the capability to	Heikki Linnakangas
	wait until it is set. Latches can be used to reliably wait until a signal arrives, which is hard otherwise because signals don't interrupt select() on some platforms, and even when they do, there's race conditions. On Unix, latches use the so called self-pipe trick under the covers to implement the sleep until the latch is set, without race conditions. On Windows, Windows events are used. Use the new latch abstraction to sleep in walsender, so that as soon as a transaction finishes, walsender is woken up to immediately send the WAL to the standby. This reduces the latency between master and standby, which is good. Preliminary work by Fujii Masao. The latch implementation is by me, with helpful comments from many people.
2010-08-30	Fix oversight in RelFileNodeBackend patch: CreateFakeRelcacheEntry needs to	Tom Lane
	initialize the rd_backend field of a fake Relation entry correctly. Fortunately, that is easy, since only non-temp relations should ever be mentioned in the WAL stream.
2010-08-30	Fix misleading DEBUG2 issued during RemoveOldXlogFiles()	Simon Riggs

2010-08-30	Truncate subtrans after each restartpoint.	Simon Riggs
	Issue reported by Harald Kolb, patch by Fujii Masao, review by me.
2010-08-26	Remove duplicate translatable phrase	Alvaro Herrera

2010-08-13	Include the backend ID in the relpath of temporary relations.	Robert Haas
	This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
2010-08-13	Make RecordTransactionCommit() respect wal_level.	Robert Haas
	Since the only purpose of WAL-loggin SharedInvalidationMessages is to support Hot Standby operation, they needn't be included when wal_level < hot_standby. Back-patch to 9.0. Review by Heikki Linnakanagas and Fujii Masao.
2010-08-12	Correct sundry errors in Hot Standby-related comments.	Robert Haas
	Fujii Masao
2010-07-29	Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0.	Simon Riggs
	Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.
2010-07-23	Avoid deep recursion when assigning XIDs to multiple levels of subxacts.	Robert Haas
	Backpatch to 8.0. Andres Freund, with cleanup and adjustment for older branches by me.
2010-07-08	Update obsolete comment. Noted by Josh Tolley.	Tom Lane

2010-07-06	pgindent run for 9.0, second run	Bruce Momjian

2010-07-03	Don't set recoveryLastXTime when replaying a checkpoint --- that was a bogus	Tom Lane
	idea from the start since the variable is only meant to track commit/abort events. This patch reverts the logic around the variable to what it was in 8.4, except that the value is now kept in shared memory rather than a static variable, so that it can be reported correctly by CreateRestartPoint (which is executed in the bgwriter).
2010-07-03	Replace max_standby_delay with two parameters, max_standby_archive_delay and	Tom Lane
	max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.
2010-06-29	Add C comment about why synchronous_commit=off behavior can lose	Bruce Momjian
	committed transactions in a postmaster crash.
2010-06-28	emode_for_corrupt_record shouldn't reduce LOG messages to WARNING.	Robert Haas
	In non-interactive sessions, WARNING sorts below LOG.
2010-06-17	Make RemoveOldXlogFiles's debug printout match style used elsewhere:	Tom Lane
	log and seg aren't an XLogRecPtr and shouldn't be printed like one. Fujii Masao
2010-06-17	Don't allow walsender to send WAL data until it's been safely fsync'd on the	Tom Lane
	master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao
2010-06-14	If a corrupt WAL record is received by streaming replication, disconnect	Heikki Linnakangas
	and retry. If the record is genuinely corrupt in the master database, there's little hope of recovering, but it's better than simply retrying to apply the corrupt WAL record in a tight loop without even trying to retransmit it, which is what we used to do.