summaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlog.c
AgeCommit message (Collapse)Author
2005-12-29Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinctionTom Lane
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS (hence, these correspond to the old SpinLockAcquire_NoHoldoff case). Given our coding rules for spinlock use, there is no reason to allow CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there is no situation where ImmediateInterruptOK will be true while holding a spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.
2005-12-28Arrange to set the LC_XXX environment variables to match our localeTom Lane
setup. This protects against undesired changes in locale behavior if someone carelessly does setlocale(LC_ALL, "") (and we know who you are, perl guys).
2005-11-22Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian
comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
2005-10-29Message correctionsPeter Eisentraut
2005-10-22Make code for selecting default WAL sync method less confusing.Tom Lane
2005-10-15Standard pgindent run for 8.1.Bruce Momjian
2005-10-03Expand pg_control information so that we can verify that the databaseTom Lane
was created on a machine with alignment rules and floating-point format similar to the current machine. Per recent discussion, this seems like a good idea with the increasing prevalence of 32/64 bit environments.
2005-08-22Rewrite gather-write patch into something less obviously bolted onTom Lane
after the fact. Fix bug with incorrect test for whether we are at end of logfile segment. Arrange for writes triggered by XLogInsert's is-cache-more-than-half-full test to synchronize with the cache boundaries, so that in long transactions we tend to write alternating halves of the cache rather than randomly chosen portions of it; this saves one more write syscall per cache load.
2005-08-22Fix some inconsistent choices of datatypes in xlog.c. Make bufferTom Lane
indexes all be int, rather than variously int, uint16 and uint32; add some casts where necessary to support large buffer arrays.
2005-08-20Convert the arithmetic for shared memory size calculation from 'int'Tom Lane
to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.
2005-08-11Autovacuum loose end mop-up. Provide autovacuum-specific vacuum costTom Lane
delay and limit, both as global GUCs and as table-specific entries in pg_autovacuum. stats_reset_on_server_start is now OFF by default, but a reset is forced if we did WAL replay. XID-wrap vacuums do not ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera
2005-07-30Fix compile for no O_SYNC, but introduced with O_DIRECT.Bruce Momjian
2005-07-29Clean up a number of autovacuum loose ends. Make the stats collectorTom Lane
track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.
2005-07-29Update O_DIRECT comment.Bruce Momjian
2005-07-29Use O_DIRECT if available when using O_SYNC for wal_sync_method.Bruce Momjian
Also, write multiple WAL buffers out in one write() operation. ITAGAKI Takahiro --------------------------------------------------------------------------- > If we disable writeback-cache and use open_sync, the per-page writing > behavior in WAL module will show up as bad result. O_DIRECT is similar > to O_DSYNC (at least on linux), so that the benefit of it will disappear > behind the slow disk revolution. > > In the current source, WAL is written as: > for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); } > Is this intentional? Can we rewrite it as follows? > write(&buffers[0], N * BLCKSZ); > > In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff). > Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff). > These two patches are independent, so they can be applied either or both. > > > I tested them on my machine and the results as follows. It shows that > direct-io and gather-write is the best choice when writeback-cache is off. > Are these two patches worth trying if they are used together? > > > | writeback | fsync= | fdata | open_ | fsync_ | open_ > patch | cache | false | sync | sync | direct | direct > ------------+-----------+--------+-------+-------+--------+--------- > direct io | off | 124.2 | 105.7 | 48.3 | 48.3 | 48.2 > direct io | on | 129.1 | 112.3 | 114.1 | 142.9 | 144.5 > gather-write| off | 124.3 | 108.7 | 105.4 | (N/A) | (N/A) > both | off | 131.5 | 115.5 | 114.4 | 145.4 | 145.2 > > - 20runs * pgbench -s 100 -c 50 -t 200 > - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8) > - using 2 ATA disks: > - hda(reiserfs) includes system and wal. > - hdc(jfs) includes database files. writeback-cache is always on. > > --- > ITAGAKI Takahiro
2005-07-23Remove unintended code addition.Bruce Momjian
2005-07-23Macro alignment cleanup.Bruce Momjian
2005-07-08Even though I'd like to see full_page_writes go away before 8.1,Tom Lane
a minimum requirement is that it not completely break the system meanwhile. Put the test in the right place.
2005-07-05Add GUC full_page_writes to control writing full pages to WAL.Bruce Momjian
2005-07-04Arrange for the postmaster (and standalone backends, initdb, etc) toTom Lane
chdir into PGDATA and subsequently use relative paths instead of absolute paths to access all files under PGDATA. This seems to give a small performance improvement, and it should make the system more robust against naive DBAs doing things like moving a database directory that has a live postmaster in it. Per recent discussion.
2005-06-30Improve the checkpoint signaling mechanism so that the bgwriter can tellTom Lane
the difference between checkpoints forced due to WAL segment consumption and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid generating 'checkpoints are occurring too frequently' messages when the checkpoint wasn't caused by WAL segment consumption. Per gripe from Chris K-L.
2005-06-29Clean up the rather historically encumbered interface to now() andTom Lane
current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.
2005-06-19Simplify uses of readdir() by creating a function ReadDir() thatTom Lane
includes error checking and an appropriate ereport(ERROR) message. This gets rid of rather tedious and error-prone manipulation of errno, as well as a Windows-specific bug workaround, at more than a dozen call sites. After an idea in a recent patch by Heikki Linnakangas.
2005-06-19Arrange to fsync two-phase-commit state files only during checkpoints;Tom Lane
given reasonably short lifespans for prepared transactions, this should mean that only a small minority of state files ever need to be fsynced at all. Per discussion with Heikki Linnakangas.
2005-06-17Two-phase commit. Original patch by Heikki Linnakangas, with additionalTom Lane
hacking by Alvaro Herrera and Tom Lane.
2005-06-15Remove old *.backup files when we do pg_stop_backup(). ThisBruce Momjian
prevents a large number of *.backup files from existing in pg_xlog/
2005-06-09Free buffer allocated via malloc (process is short-lived, but fix it anyway).Bruce Momjian
2005-06-08Change WAL-logging scheme for multixacts to be more like regularTom Lane
transaction IDs, rather than like subtrans; in particular, the information now survives a database restart. Per previous discussion, this is essential for PITR log shipping and for 2PC.
2005-06-06Modify XLogInsert API to make callers specify whether pages to be backedTom Lane
up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.
2005-06-06Remove the mostly-stubbed-out-anyway support routines for WAL UNDO.Tom Lane
That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.
2005-06-02Change CRCs in WAL records from 64bit to 32bit for performance reasons.Tom Lane
Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.
2005-05-31Add test to WAL replay to verify that xl_prev points back to the previousTom Lane
WAL record; this is necessary to be sure we recognize stale WAL records when a WAL page was only partially written during a system crash.
2005-05-20Add support for wal_fsync_writethrough for Darwin, and restructure theBruce Momjian
code to better handle writethrough. Chris Campbell
2005-05-19Split the shared-memory array of PGPROC pointers out of the sinvalTom Lane
communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.
2005-05-10Back out check for unreferenced files.Bruce Momjian
Heikki Linnakangas
2005-05-02Check the file system on postmaster startup and report any unreferencedBruce Momjian
files in the server log. Heikki Linnakangas
2005-04-28Implement sharable row-level locks, and use them for foreign key referencesTom Lane
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. When more than one transaction is holding a shared lock on a particular row, we create a MultiXactId representing that set of transactions and store its ID in the row's XMAX. This scheme allows an effectively unlimited number of row locks, just as we did before, while not costing any extra overhead except when a shared lock actually has to be shared. Still TODO: use the regular lock manager to control the grant order when multiple backends are waiting for a row lock. Alvaro Herrera and Tom Lane.
2005-04-23Add comment about checkpoint panic behavior during shutdown, perTom Lane
suggestion from Qingqing Zhou.
2005-04-17Fix comment typo.Bruce Momjian
2005-04-15Reduce PANIC to ERROR in several xlog routines that are used in bothTom Lane
critical and noncritical contexts (an example of noncritical being post-checkpoint removal of dead xlog segments). In the critical cases the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC anyway, and in the noncritical cases we shouldn't let an error take down the entire database. Arguably there should be *no* explicit PANIC errors in this module, only more START/END_CRIT_SECTION calls, but I didn't go that far. (Yet.)
2005-04-15Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behaviorTom Lane
when recycling a large number of xlog segments during checkpoint. The former behavior searched from the same start point each time, requiring O(checkpoint_segments^2) stat() calls to relocate all the segments. Instead keep track of where we stopped last time through.
2005-04-13Simplify initdb-time assignment of OIDs as I proposed yesterday, andTom Lane
avoid encroaching on the 'user' range of OIDs by allowing automatic OID assignment to use values below 16k until we reach normal operation. initdb not forced since this doesn't make any incompatible change; however a lot of stuff will have different OIDs after your next initdb.
2005-03-29Officially decouple FUNC_MAX_ARGS from INDEX_MAX_KEYS, and set theTom Lane
former to 100 by default. Clean up some of the less necessary dependencies on FUNC_MAX_ARGS; however, the biggie (FunctionCallInfoData) remains.
2005-03-24Change Win32 O_SYNC method to O_DSYNC because that is what the methodBruce Momjian
currently does. This is now the default Win32 wal sync method because we perfer o_datasync to fsync. Also, change Win32 fsync to a new wal sync method called fsync_writethrough because that is the behavior of _commit, which is what is used for fsync on Win32. Backpatch to 8.0.X.
2005-02-12Move plpgsql DEBUG from DEBUG2 to DEBUG1 because it is a user-requestedBruce Momjian
DEBUG. Fix a few places where DEBUG1 crept in that should have been DEBUG2.
2004-12-31Tag appropriate files for rc3PostgreSQL Daemon
Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
2004-12-17Fix is-it-time-for-a-checkpoint logic so that checkpoint_segments canTom Lane
usefully be larger than 255. Per gripe from Simon Riggs.
2004-11-17Minor adjustment of message style.Tom Lane
2004-11-17Don't allow pg_start_backup() to be invoked if archive_command has notNeil Conway
been defined. Patch from Gavin Sherry, editorializing by Neil Conway.
2004-11-05Small message clarificationsPeter Eisentraut