summaryrefslogtreecommitdiff
path: root/src/backend/access/transam
AgeCommit message (Collapse)Author
2009-12-09Prevent indirect security attacks via changing session-local state withinTom Lane
an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136
2009-11-23Fix an old bug in multixact and two-phase commit. Prepared transactions canHeikki Linnakangas
be part of multixacts, so allocate a slot for each prepared transaction in the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer the oldest member value from the current backends slot to the prepared xact slot. Also save and recover the value from the 2pc state file. The symptom of the bug was that after a transaction prepared, a shared lock still held by the prepared transaction was sometimes ignored by other transactions. Fix back to 8.1, where both 2PC and multixact were introduced.
2009-09-13Don't error out if recycling or removing an old WAL segment fails at the endHeikki Linnakangas
of checkpoint. Although the checkpoint has been written to WAL at that point already, so that all data is safe, and we'll retry removing the WAL segment at the next checkpoint, if such a failure persists we won't be able to remove any other old WAL segments either and will eventually run out of disk space. It's better to treat the failure as non-fatal, and move on to clean any other WAL segment and continue with any other end-of-checkpoint cleanup. We don't normally expect any such failures, but on Windows it can happen with some anti-virus or backup software that lock files without FILE_SHARE_DELETE flag. Also, the loop in pgrename() to retry when the file is locked was broken. If a file is locked on Windows, you get ERROR_SHARE_VIOLATION, not ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct in some environment), and added ERROR_LOCK_VIOLATION to be consistent with similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to 10s, on the grounds that since it's been broken, we've effectively had a timeout of 0s and no-one has complained, so a smaller timeout is actually closer to the old behavior. A longer timeout would mean that if recycling a WAL file fails because it's locked for some reason, InstallXLogFileSegment() will hold ControlFileLock for longer, potentially blocking other backends, so a long timeout isn't totally harmless. While we're at it, set errno correctly in pgrename(). Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c changes would make sense on other platforms and thus on older versions as well, but since there's no such locking issues on other platforms, it's not worth it.
2009-09-10On Windows, when a file is deleted and another process still has an openHeikki Linnakangas
file handle on it, the file goes into "pending deletion" state where it still shows up in directory listing, but isn't accessible otherwise. That confuses RemoveOldXLogFiles(), making it think that the file hasn't been archived yet, while it actually was, and it was deleted along with the .done file. Fix that by renaming the file with ".deleted" extension before deleting it. Also check the return value of rename() and unlink(), so that if the removal fails for any reason (e.g another process is holding the file locked), we don't delete the .done file until the WAL file is really gone. Backpatch to 8.2, which is the oldest version supported on Windows.
2009-09-01Force VACUUM to recalculate oldestXmin even when we haven't changed ourTom Lane
own database's datfrozenxid, if the current value is old enough to be forcing autovacuums or warning messages. This ensures that a bogus value is replaced as soon as possible. Per a comment from Heikki.
2009-09-01Actually, we need to bump the format identifier on twophase filesTom Lane
because of readjustment of 2PC rmgr IDs for flatfile removal.
2009-09-01Remove flatfiles.c, which is now obsolete.Alvaro Herrera
Recent commits have removed the various uses it was supporting. It was a performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems it slowed down user creation after a billion users.
2009-08-31Track the current XID wrap limit (or more accurately, the oldest unfrozenTom Lane
XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. To avoid bloating the checkpoint records unreasonably, I switched from tracking the oldest database by name to tracking it by OID. This turns out to save cycles in general (everywhere but the warning-generating paths, which we hardly care about) and also helps us deal with the case that the oldest database got dropped instead of being vacuumed. The prior coding might go for a long time without updating the wrap limit in that case, which is bad because it might result in a lot of useless autovacuum activity.
2009-08-27In the checkpoint written at the end of archive recovery, the WAL page headerHeikki Linnakangas
was incorrectly initialized with timeline ID 0. That rendered the WAL page unrecoverable, making a subsequent archive recovery stop at that point. ThisTimeLineID needs to be initialized before calling AdvanceXLInsertBuffer(). This fixes bug #5011 reported by James Bardin. Backpatch to 8.4, as the bug was introduced by the changes to use of bgwriter for writing the end-of-archive-recovery checkpoint. Patch by Tom Lane.
2009-08-12Allow backends to start up without use of the flat-file copy of pg_database.Tom Lane
To make this work in the base case, pg_database now has a nailed-in-cache relation descriptor that is initialized using hardwired knowledge in relcache.c. This means pg_database is added to the set of relations that need to have a Schema_pg_xxx macro maintained in pg_attribute.h. When this path is taken, we'll have to do a seqscan of pg_database to find the row we need. In the normal case, we are able to do an indexscan to find the database's row by name. This is made possible by storing a global relcache init file that describes only the shared catalogs and their indexes (and therefore is usable by all backends in any database). A new backend loads this cache file, finds its database OID after an indexscan on pg_database, and then loads the local relcache init file for that database. This change should effectively eliminate number of databases as a factor in backend startup time, even with large numbers of databases. However, the real reason for doing it is as a first step towards getting rid of the flat files altogether. There are still several other sub-projects to be tackled before that can happen.
2009-08-08Document that LocalSetXLogInsertAllowed can be re-executed.Tom Lane
Per comment from Simon.
2009-08-07rm_cleanup functions need to be allowed to write WAL entries. This oversightTom Lane
appears to explain the recent reports of "PANIC: cannot make new WAL entries during recovery".
2009-06-26Cleanup and code review for the patch that made bgwriter active duringTom Lane
archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
2009-06-25Fix some serious bugs in archive recovery, now that bgwriter is activeHeikki Linnakangas
during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
2009-06-25The code to unlink dropped relations in FinishPreparedTransaction() wasHeikki Linnakangas
acting like runs inside WAL recovery, but it doesn't. I must've copy-pasted this from a redo-function in the relation forks patch. Noticed by Tom Lane while he was looking through callers of smgrdounlink().
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-06-02Only recycle normal files in pg_xlog as WAL segments. pg_standby createsHeikki Linnakangas
symbolic links with the -l option, and as Fujii Masao pointed out we ended up overwriting files in the archive directory before this patch. Patch by Aidan Van Dyk, Fujii Masao and me. Backpatch to 8.3, where pg_standby was introduced.
2009-05-28When archiving is enabled, rotate the last WAL segment at shutdown so thatHeikki Linnakangas
all transactions are archived. Original patch by Guillaume Smet.
2009-05-15Fix all the server-side SIGQUIT handlers (grumble ... why so many identicalTom Lane
copies?) to ensure they really don't run proc_exit/shmem_exit callbacks, as was intended. I broke this behavior recently by installing atexit callbacks without thinking about the one case where we truly don't want to run those callback functions. Noted in an example from Dave Page.
2009-05-14Include recovery_end_command in recovery.conf.sample.Tom Lane
Per suggestion of Jaime Casanova.
2009-05-14Improve a couple of comments.Tom Lane
2009-05-14Add recovery_end_command option to recovery.conf. recovery_end_commandHeikki Linnakangas
is run at the end of archive recovery, providing a chance to do external cleanup. Modify pg_standby so that it no longer removes the trigger file, that is to be done using the recovery_end_command now. Provide a "smart" failover mode in pg_standby, where we don't fail over immediately, but only after recovering all unapplied WAL from the archive. That gives you zero data loss assuming all WAL was archived before failover, which is what most users of pg_standby actually want. recovery_end_command by Simon Riggs, pg_standby changes by Fujii Masao and myself.
2009-05-13Rewrite xml.c's memory management (yet again). Give up on the idea ofTom Lane
redirecting libxml's allocations into a Postgres context. Instead, just let it use malloc directly, and add PG_TRY blocks as needed to be sure we release libxml data structures in error recovery code paths. This is ugly but seems much more likely to play nicely with third-party uses of libxml, as seen in recent trouble reports about using Perl XML facilities in pl/perl and bug #4774 about contrib/xml2. I left the code for allocation redirection in place, but it's only built/used if you #define USE_LIBXMLCONTEXT. This is because I found it useful to corral libxml's allocations in a palloc context when hunting for libxml memory leaks, and we're surely going to have more of those in the future with this type of approach. But we don't want it turned on in a normal build because it breaks exactly what we need to fix. I have not re-indented most of the code sections that are now wrapped by PG_TRY(); that's for ease of review. pg_indent will fix it. This is a pre-existing bug in 8.3, but I don't dare back-patch this change until it's gotten a reasonable amount of field testing.
2009-05-07Request XLOG switch before writing checkpoint in pg_start_backup(). OtherwiseHeikki Linnakangas
you can end up with an unrecoverable backup if you start a new base backup right after finishing archive recovery. In that scenario, the redo pointer of the checkpoint that pg_start_backup() writes points to the XLOG segment where the timeline-changing end-of-archive-recovery checkpoint is. The beginning of that segment contains pages with the old timeline ID, and we don't accept that in recovery unless we find a history file covering the old timeline ID. If you omit pg_xlog from the base backup and clear the archive directory before starting the backup, there will be no such history file available. The bug is present in all versions since PITR was introduced in 8.0, but I'm back-patching only back to 8.2. Earlier versions didn't have XLOG switch records, making this fix unfeasible. Given the lack of reports until now, it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1. Per report and suggestion by Mikael Krantz
2009-04-23Change the default value of max_prepared_transactions to zero, and addTom Lane
documentation warnings against setting it nonzero unless active use of prepared transactions is intended and a suitable transaction manager has been installed. This should help to prevent the type of scenario we've seen several times now where a prepared transaction is forgotten and eventually causes severe maintenance problems (or even anti-wraparound shutdown). The only real reason we had the default be nonzero in the first place was to support regression testing of the feature. To still be able to do that, tweak pg_regress to force a nonzero value during "make check". Since we cannot force a nonzero value in "make installcheck", add a variant regression test "expected" file that shows the results that will be obtained when max_prepared_transactions is zero. Also, extend the HINT messages for transaction wraparound warnings to mention the possibility that old prepared transactions are causing the problem. All per today's discussion.
2009-04-22After archive recovery, mark the last WAL segment from the parent timelineHeikki Linnakangas
ready for archival. It was marked at the next checkpoint anyway, but waiting for the next checkpoint is an unnecessary delay. Fujii Masao
2009-04-07Add an optional parameter to pg_start_backup() that specifies whether to doTom Lane
the checkpoint in immediate or lazy mode. This is to address complaints that pg_start_backup() takes a long time even when there's no need to minimize its I/O consumption.
2009-04-02Revert DTrace patch from Robert LorBruce Momjian
2009-04-02Add support for additional DTrace probes.Bruce Momjian
Robert Lor
2009-03-11Code review for dtrace probes added (so far) to 8.4. Adjust placement ofTom Lane
some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.
2009-03-04Reload config file in startup process on SIGHUP.Heikki Linnakangas
Fujii Masao
2009-02-23Change the signaling of end-of-recovery. Startup process now indicates endHeikki Linnakangas
of recovery by exiting with exit code 0, like in previous releases. Per Tom's suggestion.
2009-02-18Start background writer during archive recovery. Background writer now performsHeikki Linnakangas
its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.
2009-02-07Fix obsolete comment. Zdenek KotalaHeikki Linnakangas
2009-01-23Put back fast-path for the case that there's no backup blocks inHeikki Linnakangas
RestoreBkpBlocks. Went missing in my recent refactoring patch, as pointed out by Simon's hot standby patch.
2009-01-20Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas
be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
2009-01-11Re-enable the old code in xlog.c that tried to use posix_fadvise(), so thatTom Lane
we can get some buildfarm feedback about whether that function is still problematic. (Note that the planned async-preread patch will not really prove anything one way or the other in buildfarm testing, since it will be inactive with default GUC settings.)
2009-01-01Update copyright for 2009.Bruce Momjian
2008-12-24Change the name of dtrace wal tracepoints:Bruce Momjian
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY Robert Lor
2008-12-17The attached patch contains a couple of fixes in the existing probes andBruce Momjian
includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor
2008-12-11To reduce confusion over whether VACUUM FULL is needed for anti-wraparoundTom Lane
vacuuming (it's not), say "database-wide VACUUM" instead of "full-database VACUUM" in the relevant hint messages. Also, document the permissions needed to do this. Per today's discussion.
2008-12-09Revert SIGUSR1 multiplexing patch, per Tom's objection.Heikki Linnakangas
2008-12-09Provide support for multiplexing SIGUSR1 signal. The upcoming synchronousHeikki Linnakangas
replication patch needs a signal, but we've already used SIGUSR1 and SIGUSR2 in normal backends. This patch allows reusing SIGUSR1 for that, and for other purposes too if the need arises.
2008-12-04Fix a couple of snapshot management bugs in the new ResourceOwner world:Alvaro Herrera
non-writable large objects need to have their snapshots registered on the transaction resowner, not the current portal's, because it must persist until the large object is closed (which the portal does not). Also, ensure that the serializable snapshot is recorded by the transaction resource owner too, even when a subtransaction has changed the current resource owner before serializable is taken. Per bug reports from Pavan Deolasee.
2008-12-03Introduce visibility map. The visibility map is a bitmap with one bit perHeikki Linnakangas
heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.
2008-12-03If pg_stop_backup() is called just after switching to a new xlog file,Heikki Linnakangas
wait for the previous instead of the new file to be archived. Based on patch by Simon Riggs.
2008-11-26Rely on relcache invalidation to update the cached size of the FSM.Heikki Linnakangas
2008-11-19Rethink the way FSM truncation works. Instead of WAL-logging FSMHeikki Linnakangas
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.
2008-11-11Fix sloppy omission of now-required #include's.Tom Lane
2008-11-11Change error messages to print the physical path, likeHeikki Linnakangas
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.