summaryrefslogtreecommitdiff
path: root/src/backend/access/transam
AgeCommit message (Collapse)Author
2011-01-09Split pg_start_backup() and pg_stop_backup() into two piecesMagnus Hagander
Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.
2011-01-07Improve recovery.conf.sample comments.Robert Haas
Jehan-Guillaume de Rorthais, with some additional wordsmithing by me.
2011-01-03Update comments in RecordTransactionCommit() to mention unlogged tables.Robert Haas
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-12-30Avoid unnecessary public struct declaration in slru.hAlvaro Herrera
Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner
2010-12-29Support unlogged tables.Robert Haas
The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
2010-12-29Add REPLICATION privilege for ROLEsMagnus Hagander
This privilege is required to do Streaming Replication, instead of superuser, making it possible to set up a SR slave that doesn't have write permissions on the master. Superuser privileges do NOT override this check, so in order to use the default superuser account for replication it must be explicitly granted the REPLICATION permissions. This is backwards incompatible change, in the interest of higher default security.
2010-12-24Remove quotes from boolean recovery.conf.sample parameters, now that theBruce Momjian
quotes are not required. This now matches postgresql.conf's specification of booleans.
2010-12-23Rewrite the GiST insertion logic so that we don't need the post-recoveryHeikki Linnakangas
cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.
2010-12-20Allow transactions that don't write WAL to commit asynchronously.Robert Haas
This case can arise if a transaction has written data, but only to temporary tables. Loss of the commit record in case of a crash won't matter, because the temporary tables will be lost anyway. Reviewed by Heikki Linnakangas and Simon Riggs.
2010-12-14Instrument checkpoint sync calls.Robert Haas
Greg Smith, reviewed by Jeff Janes
2010-12-10Use symbolic names not octal constants for file permission flags.Tom Lane
Purely cosmetic patch to make our coding standards more consistent --- we were doing symbolic some places and octal other places. This patch fixes all C-coded uses of mkdir, chmod, and umask. There might be some other calls I missed. Inconsistency noted while researching tablespace directory permissions issue.
2010-12-08Optimize commit_siblings in two ways to improve group commit.Simon Riggs
First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith
2010-12-07Fix bugs in the hot standby known-assigned-xids tracking logic. If there'sHeikki Linnakangas
an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.
2010-12-06Fix two typos, by Fujii Masao.Heikki Linnakangas
2010-12-03Remove now-outdated mention of quotes being required in recovery.conf.Robert Haas
Noted by Itagaki Takahiro.
2010-12-03Use GUC lexer for recovery.conf parsing.Robert Haas
This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.
2010-11-23Remove useless whitespace at end of linesPeter Eisentraut
2010-11-11Fix bug introduced by the recent patch to check that the checkpoint redoHeikki Linnakangas
location read from backup label file can be found: wasShutdown was set incorrectly when a backup label file was found. Jeff Davis, with a little tweaking by me.
2010-11-09Add monitoring function pg_last_xact_replay_timestamp.Robert Haas
Fujii Masao, with a little wordsmithing by me.
2010-11-02Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000000000000001)Heikki Linnakangas
rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a more future-proof fix for the corner-case bug in streaming replication that was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in February as well. Avoiding 0/0 as a valid value should prevent bugs like that in the future. Per Tom Lane's idea. Back-patch to 9.0. Since this only affects bootstrapping, it makes no difference to existing installations. We don't need to worry about the bug in existing installations, because if you've managed to get past the initial base backup already, you won't hit the bug in the future either.
2010-11-01Fix corner-case bug in tracking of latest removed WAL segment duringHeikki Linnakangas
streaming replication. We used log/seg 0/0 to indicate that no WAL segments have been removed since startup, but 0/0 is a valid value for the very first WAL segment after initdb. To make that disambiguous, store (latest removed WAL segment + 1) in the global variable. Per report from Matt Chesler, also reproduced by Greg Smith.
2010-10-26Before removing backup_label and irrevocably changing pg_control file, checkHeikki Linnakangas
that WAL file containing the checkpoint redo-location can be found. This avoids making the cluster irrecoverable if the redo location is in an earlie WAL file than the checkpoint record. Report, analysis and patch by Jeff Davis, with small changes by me.
2010-10-20Don't try to fetch database name when SetTransactionIdLimit() is executedTom Lane
outside a transaction. This repairs brain fade in my patch of 2009-08-30: the reason we had been storing oldest-database name, not OID, in ShmemVariableCache was of course to avoid having to do a catalog lookup at times when it might be unsafe. This error explains why Aleksandr Dushein is having trouble getting out of an XID wraparound state in bug #5718, though not how he got into that state in the first place. I suspect pg_upgrade is at fault there.
2010-10-20Remove AtStart_Cache() call in CommandCounterIncrement().Alvaro Herrera
This call was present in the aboriginal code from Berkeley, and has never been touched; it may very well be that it was there to mask effects of bugs in other places and it may no longer be necessary. The removal has been foreseen in a code comment since 2007; this seems to be a good time to test this hypothesis.
2010-10-14Make startup process respond to signals to cancel waiting on latch.Simon Riggs
A tidy up for recently committed changes to startup latch. Fujii Masao
2010-10-14Fix bug in comment of timeline history file.Simon Riggs
Fujii Masao
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-09-17Add some documentation about how we WAL-log filesystem actions.Tom Lane
Per a question from Robert Haas.
2010-09-15Fix two typos in comments, spotted by Fujii Masao and Thom BrownHeikki Linnakangas
2010-09-15Use a latch to make startup process wake up and replay immediately whenHeikki Linnakangas
new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.
2010-09-11Introduce latches. A latch is a boolean variable, with the capability toHeikki Linnakangas
wait until it is set. Latches can be used to reliably wait until a signal arrives, which is hard otherwise because signals don't interrupt select() on some platforms, and even when they do, there's race conditions. On Unix, latches use the so called self-pipe trick under the covers to implement the sleep until the latch is set, without race conditions. On Windows, Windows events are used. Use the new latch abstraction to sleep in walsender, so that as soon as a transaction finishes, walsender is woken up to immediately send the WAL to the standby. This reduces the latency between master and standby, which is good. Preliminary work by Fujii Masao. The latch implementation is by me, with helpful comments from many people.
2010-08-30Fix oversight in RelFileNodeBackend patch: CreateFakeRelcacheEntry needs toTom Lane
initialize the rd_backend field of a fake Relation entry correctly. Fortunately, that is easy, since only non-temp relations should ever be mentioned in the WAL stream.
2010-08-30Fix misleading DEBUG2 issued during RemoveOldXlogFiles()Simon Riggs
2010-08-30Truncate subtrans after each restartpoint.Simon Riggs
Issue reported by Harald Kolb, patch by Fujii Masao, review by me.
2010-08-26Remove duplicate translatable phraseAlvaro Herrera
2010-08-13Include the backend ID in the relpath of temporary relations.Robert Haas
This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
2010-08-13Make RecordTransactionCommit() respect wal_level.Robert Haas
Since the only purpose of WAL-loggin SharedInvalidationMessages is to support Hot Standby operation, they needn't be included when wal_level < hot_standby. Back-patch to 9.0. Review by Heikki Linnakanagas and Fujii Masao.
2010-08-12Correct sundry errors in Hot Standby-related comments.Robert Haas
Fujii Masao
2010-07-29Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0.Simon Riggs
Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.
2010-07-23Avoid deep recursion when assigning XIDs to multiple levels of subxacts.Robert Haas
Backpatch to 8.0. Andres Freund, with cleanup and adjustment for older branches by me.
2010-07-08Update obsolete comment. Noted by Josh Tolley.Tom Lane
2010-07-06pgindent run for 9.0, second runBruce Momjian
2010-07-03Don't set recoveryLastXTime when replaying a checkpoint --- that was a bogusTom Lane
idea from the start since the variable is only meant to track commit/abort events. This patch reverts the logic around the variable to what it was in 8.4, except that the value is now kept in shared memory rather than a static variable, so that it can be reported correctly by CreateRestartPoint (which is executed in the bgwriter).
2010-07-03Replace max_standby_delay with two parameters, max_standby_archive_delay andTom Lane
max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.
2010-06-29Add C comment about why synchronous_commit=off behavior can loseBruce Momjian
committed transactions in a postmaster crash.
2010-06-28emode_for_corrupt_record shouldn't reduce LOG messages to WARNING.Robert Haas
In non-interactive sessions, WARNING sorts below LOG.
2010-06-17Make RemoveOldXlogFiles's debug printout match style used elsewhere:Tom Lane
log and seg aren't an XLogRecPtr and shouldn't be printed like one. Fujii Masao
2010-06-17Don't allow walsender to send WAL data until it's been safely fsync'd on theTom Lane
master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao
2010-06-14If a corrupt WAL record is received by streaming replication, disconnectHeikki Linnakangas
and retry. If the record is genuinely corrupt in the master database, there's little hope of recovering, but it's better than simply retrying to apply the corrupt WAL record in a tight loop without even trying to retransmit it, which is what we used to do.