summaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlog.c
AgeCommit message (Collapse)Author
2022-04-11Remove dead code in do_pg_backup_start().Tom Lane
As of commit 39969e2a1, no caller of do_pg_backup_start() passes NULL for labelfile or tblspcmapfile, nor is it plausible that any would do so in the future. Remove the code that coped with that case, as (a) it's dead and (b) it causes Coverity to bleat about possibly leaked storage. While here, do some janitorial work on the function's header comment.
2022-04-08Rename delayChkpt to delayChkptFlags.Robert Haas
Before commit 412ad7a55639516f284cd0ef9757d6ae5c7abd43, delayChkpt was a Boolean. Now it's an integer. Extensions using it need to be appropriately updated, so let's rename the field to make sure that a hard compilation failure occurs. Replacing delayChkpt with delayChkptFlags made a few comments extend past 80 characters, so I reflowed them and changed some wording very slightly. The back-branches will need a different change to restore compatibility with existing minor releases; this is just for master. Per suggestion from Tom Lane. Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com
2022-04-07Prefetch data referenced by the WAL, take II.Thomas Munro
Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2022-04-06pgstat: store statistics in shared memory.Andres Freund
Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-06Remove exclusive backup modeStephen Frost
Exclusive-mode backups have been deprecated since 9.6 (when non-exclusive backups were introduced) due to the issues they can cause should the system crash while one is running and generally because non-exclusive provides a much better interface. Further, exclusive backup mode wasn't really being tested (nor was most of the related code- like being able to log in just to stop an exclusive backup and the bits of the state machine related to that) and having to possibly deal with an exclusive backup and the backup_label file existing during pg_basebackup, pg_rewind, etc, added other complexities that we are better off without. This patch removes the exclusive backup mode, the various special cases for dealing with it, and greatly simplifies the online backup code and documentation. Authors: David Steele, Nathan Bossart Reviewed-by: Chapman Flack Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net https://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
2022-03-24Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.Robert Haas
If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
2022-03-18Add circular WAL decoding buffer, take II.Thomas Munro
Teach xlogreader.c to decode the WAL into a circular buffer. This will support optimizations based on looking ahead, to follow in a later commit. * XLogReadRecord() works as before, decoding records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros and certain traditional members like xlogreader->ReadRecPtr. * An alternative new interface XLogReadAhead()/XLogNextRecord() is added that returns pointers to DecodedXLogRecord objects so that it's now possible to look ahead in the WAL stream while replaying. * In order to be able to use the new interface effectively while streaming data, support is added for the page_read() callback to respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of waiting for more data to arrive. No direct user of the new interface is included in this commit, though XLogReadRecord() uses it internally. Existing code doesn't need to change, except in a few places where it was accessing reader internals directly and now needs to go through accessor macros. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
2022-03-16Fix race between DROP TABLESPACE and checkpointing.Thomas Munro
Commands like ALTER TABLE SET TABLESPACE may leave files for the next checkpoint to clean up. If such files are not removed by the time DROP TABLESPACE is called, we request a checkpoint so that they are deleted. However, there is presently a window before checkpoint start where new unlink requests won't be scheduled until the following checkpoint. This means that the checkpoint forced by DROP TABLESPACE might not remove the files we expect it to remove, and the following ERROR will be emitted: ERROR: tablespace "mytblspc" is not empty To fix, add a call to AbsorbSyncRequests() just before advancing the unlink cycle counter. This ensures that any unlink requests forwarded prior to checkpoint start (i.e., when ckpt_started is incremented) will be processed by the current checkpoint. Since AbsorbSyncRequests() performs memory allocations, it cannot be called within a critical section, so we also need to move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section. This is an old bug, so back-patch to all supported versions. Author: Nathan Bossart <nathandbossart@gmail.com> Reported-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
2022-03-15Fix collection of typos in the code and the documentationMichael Paquier
Some words were duplicated while other places were grammatically incorrect, including one variable name in the code. Author: Otto Kekalainen, Justin Pryzby Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com
2022-03-15Fix pg_basebackup with in-place tablespaces.Thomas Munro
Previously, pg_basebackup from a cluster that contained an 'in-place' tablespace, as introduced by commit 7170f215, would produce a harmless warning on Unix and fail completely on Windows. Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com
2022-02-20Fix uninitialized variable.Heikki Linnakangas
I'm very surprised the compiler didn't warn about it. But Coverity and Valgrind did.
2022-02-16Fix read beyond buffer bug introduced by the split xlog.c patch.Heikki Linnakangas
FinishWalRecovery() copied the valid part of the last WAL block into a palloc'd buffer, and the code in StartupXLOG() copied it to the WAL buffer. But the memcpy in StartupXLOG() copied a full 8kB block, not just the valid part, i.e. it copied from beyond the end of the buffer. The invalid part was cleared immediately afterwards, so as long as the memory was allocated and didn't segfault, it didn't do any harm, but it can definitely segfault. Discussion: https://www.postgresql.org/message-id/efc12e32-5af2-3485-5b1d-5af9f707491a@iki.fi
2022-02-16Split xlog.c into xlog.c and xlogrecovery.c.Heikki Linnakangas
This moves the functions related to performing WAL recovery into the new xlogrecovery.c source file, leaving xlog.c responsible for maintaining the WAL buffers, coordinating the startup and switch from recovery to normal operations, and other miscellaneous stuff that have always been in xlog.c. Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16Move code around in StartupXLOG().Heikki Linnakangas
This is in preparation for the next commit, which will split off recovery-related code from xlog.c into a new source file. This is the order that things will happen with the next commit, and the point of this commit is to make these ordering changes more explicit, while the next commit mechanically moves the source code to the new file. To aid review, I added "BEGIN/END function" comments to mark which blocks of code are moved to which functions in the next commit. They will be gone in the next commit. Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD.Heikki Linnakangas
Set it directly in CreateOverwriteContrecordRecord(). That way, AdvanceXLInsertBuffer() doesn't need the missingContrecPtr global variable. This is in preparation for splitting xlog.c into multiple files. Reviewed-by: Robert Haas Discussion: https://www.postgresql.org/message-id/a462d79c-cb5a-47cc-e9ac-616b5003965f%40iki.fi
2022-02-16Run pgindent on xlog.c.Heikki Linnakangas
To tidy up after some recent refactorings in xlog.c. These would be fixed by the pgindent run we do at the end of the development cycle, but I want to clean these up now as I'm about to do some more big refactorings on xlog.c.
2022-02-03Allow archiving via loadable modules.Robert Haas
Running a shell command for each file to be archived has a lot of overhead and may not offer as much error checking as you want, or the exact semantics that you want. So, offer the option to call a loadable module for each file to be archived, rather than running a shell command. Also, add a 'basic_archive' contrib module as an example implementation that archives to a local directory. Nathan Bossart, with a little bit of kibitzing by me. Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-01-25Improve errors related to incorrect TLI on checkpoint record replayMichael Paquier
WAL replay would cause a hard crash if the timeline expected by a XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being replayed, using the same error message for all three of them. This commit changes those error messages to use different wordings, adapted to each record type, which is useful when it comes to the debugging of an issue in this area. Author: Amul Sul Reviewed-by: Nathan Bossart, Robert Haas Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com
2022-01-25Fix various typos, grammar and code style in comments and docsMichael Paquier
This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-12-29Fix incorrect format placeholdersPeter Eisentraut
2021-12-16Change ProcSendSignal() to take pgprocno.Thomas Munro
Instead of referring to target backends by pid, use pgprocno. This means that we don't have to scan the ProcArray and we can drop some special case code for dealing with the startup process. Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
2021-12-13Remove InitXLOGAccess().Robert Haas
It's not great that RecoveryInProgress() calls InitXLOGAccess(), because a status inquiry function typically shouldn't have the side effect of performing initializations. We could fix that by calling InitXLOGAccess() from some other place, but instead, let's remove it altogether. One thing InitXLogAccess() did is initialize wal_segment_size, but it doesn't need to do that. In the postmaster, PostmasterMain() calls LocalProcessControlFile(), and all child processes will inherit that value -- except in EXEC_BACKEND bulds, but then each backend runs SubPostmasterMain() which also calls LocalProcessControlFile(). The other thing InitXLOGAccess() did is update RedoRecPtr and doPageWrites, but that's not critical, because all code that uses them will just retry if it turns out that they've changed. The only difference is that most code will now see an initial value that is definitely invalid instead of one that might have just been way out of date, but that will only happen once per backend lifetime, so it shouldn't be a big deal. Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres Freund, Heikki Linnakangas, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-12-13Default to log_checkpoints=on, log_autovacuum_min_duration=10mRobert Haas
The idea here is that when a performance problem is known to have occurred at a certain point in time, it's a good thing if there is some information available from the logs to help figure out what might have happened around that time. This change attracted an above-average amount of dissent, because it means that a server with default settings will produce some amount of log output even if nothing has gone wrong. However, by my count, the mailing list discussion had about twice as many people in favor of the change as opposed. The reasons for believing that the extra log output is not an issue in practice are: (1) the rate at which messages can be generated by this setting is bounded to one every few minutes on a properly-configured system and (2) production systems tend to have a lot more junk in the log from that due to failed connection attempts, ERROR messages generated by application activity, and the like. Bharath Rupireddy, reviewed by Fujii Masao and by me. Many other people commented on the thread, but as far as I can see that was discussion of the merits of the change rather than review of the patch. Discussion: https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com
2021-12-01Remove mention of TimeLineID update from commentsDaniel Gustafsson
Commit 4a92a1c3d removed the TimeLineID update from RecoveryInProgress, update comments accordingly. Author: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96wyzs8N45jc-kYd-bTE02hRWQieLZRpsUtNbhap7_PuQ@mail.gmail.com
2021-11-29Centralize timestamp computation of control file on updatesMichael Paquier
This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
2021-11-24Replace straggling uses of ReadRecPtr/EndRecPtr.Andres Freund
d2ddfa681db removed ReadRecPtr/EndRecPtr, but two uses within an #ifdef WAL_DEBUG escaped. Discussion: https://postgr.es/m/20211124231206.gbadj5bblcljb6d5@alap3.anarazel.de
2021-11-24xlog.c: Remove global variables ReadRecPtr and EndRecPtr.Robert Haas
In most places, the variables necessarily store the same value as the eponymous members of the XLogReaderState that we use during WAL replay, because ReadRecord() assigns the values from the structure members to the global variables just after XLogReadRecord() returns. However, XLogBeginRead() adjusts the structure members but not the global variables, so after XLogBeginRead() and before the completion of XLogReadRecord() the values can differ. Otherwise, they must be identical. According to my analysis, the only place where either variable is referenced at a point where it might not have the same value as the structure member is the refrence to EndRecPtr within XLogPageRead. Therefore, at every other place where we are using the global variable, we can just switch to using the structure member instead, and remove the global variable. However, we can, and in fact should, do this in XLogPageRead() as well, because at that point in the code, the global variable will actually store the start of the record we want to read - either because it's where the last WAL record ended, or because the read position has been changed using XLogBeginRead since the last record was read. The structure member, on the other hand, will already have been updated to point to the end of the record we just read. Elsewhere, the latter is what we use as an argument to emode_for_corrupt_record(), so we should do the same here. This part of the patch is perhaps a bug fix, but I don't think it has any important consequences, so no back-patch. The point here is just to continue to whittle down the entirely excessive use of global variables in xlog.c. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
2021-11-24Fix corner-case failure to detect improper timeline switch.Robert Haas
rescanLatestTimeLine() contains a guard against switching to a timeline that forked off from the current one prior to the current recovery point, but that guard does not work if the timeline switch occurs before the first WAL recod (which must be the checkpoint record) is read. Without this patch, an improper timeline switch is therefore possible in such cases. This happens because rescanLatestTimeLine() relies on the global variable EndRecPtr to understand the current position of WAL replay. However, EndRecPtr at this point in the code contains the endpoint of the last-replayed record, not the startpoint or endpoint of the record being replayed now. Thus, before any records have been replayed, it's zero, which causes the sanity check to always pass. To fix, pass down the correct timeline explicitly. The EndRecPtr value we want is the one from the xlogreader, which will be the starting position of the record we're about to try to read, rather than the global variable, which is the ending position of the last record we successfully read. They're usually the same, but not in the corner case described here. No back-patch, because in v14 and earlier branhes, we were using the wrong TLI here as well as the wrong LSN. In master, that was fixed by commit 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd, but that and it's prerequisite patches are too invasive to back-patch for such a minor issue. Patch by me, reviewed by Amul Sul. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
2021-11-22Be more specific about OOM in XLogReaderAllocateAlvaro Herrera
A couple of spots can benefit from an added errdetail(), which matches what we were already doing in other places; and those that cannot withstand errdetail() can get a more descriptive primary message. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/CALj2ACV+cX1eM03GfcA=ZMLXh5fSn1X1auJLz3yuS1duPSb9QA@mail.gmail.com
2021-11-22Report wait events for local shell commands like archive_command.Fujii Masao
This commit introduces new wait events for archive_command, archive_cleanup_command, restore_command and recovery_end_command. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/4ca4f920-6b48-638d-08b2-93598356f5d3@oss.nttdata.com
2021-11-17Remove global variable "LastRec" in xlog.cMichael Paquier
This variable is used only by StartupXLOG() now, so let's make it local to simplify the code. Author: Amul Sul Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAAJ_b96Qd023itERBRN9Z7P2saNDT3CYvGuMO8RXwndVNN6z7g@mail.gmail.com
2021-11-16Move InitXLogInsert() call from InitXLOGAccess() to BaseInit().Robert Haas
At present, there is an undocumented coding rule that you must call RecoveryInProgress(), or do something else that results in a call to InitXLogInsert(), before trying to write WAL. Otherwise, the WAL construction buffers won't be initialized, resulting in failures. Since it's not good to rely on a status inquiry function like RecoveryInProgress() having the side effect of initializing critical data structures, instead do the initialization eariler, when the backend first starts up. Patch by me. Reviewed by Nathan Bossart and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-11-10More cleanup of 'ThisTimeLineID'.Robert Haas
In XLogCtlData, rename the structure member ThisTimeLineID to InsertTimeLineID and update the comments to make clear that it's only expected to be set after recovery is complete. In StartupXLOG, replace the local variables ThisTimeLineID and PrevTimeLineID with new local variables replayTLI and newTLI. In the old scheme, ThisTimeLineID was the replay TLI until we created a new timeline, and after that the replay TLI was in PrevTimeLineID. Now, replayTLI is the TLI from which we last replayed WAL throughout the entire function, and newTLI is either that, or the new timeline created upon promotion. Remove some misleading comments from the comment block just above where recoveryTargetTimeLineGoal and friends are declared. It's become incorrect, not only because ThisTimeLineID as a variable is now gone, but also because the rmgr code does not care about ThisTimeLineID and has not since what used to be the TLI field in the page header was repurposed to store the page checksum. Add a comment GetFlushRecPtr that it's only supposed to be used in normal running, and an assertion to verify that this is so. Per some ideas from Michael Paquier and some of my own. Review by Michael Paquier also. Discussion: http://postgr.es/m/CA+TgmoY1a2d1AnVR3tJcKmGGkhj7GGrwiNwjtKr21dxOuLBzCQ@mail.gmail.com
2021-11-07Silence uninitialized-variable warning.Tom Lane
Quite a few buildfarm animals are warning about this, and lapwing is actually failing (because -Werror). It's a false positive AFAICS, so no need to do more than zero the variable to start with. Discussion: https://postgr.es/m/YYXJnUxgw9dZKxlX@paquier.xyz
2021-11-05Change ThisTimeLineID from a global variable to a local variable.Robert Haas
StartupXLOG() still has ThisTimeLineID as a local variable, but the remaining code in xlog.c now needs to the relevant TimeLineID by some other means. Mostly, this means that we now pass it as a function parameter to a bunch of functions where we didn't previously. However, a few cases require special handling: - In functions that might be called by outside callers who wouldn't necessarily know what timeline to specify, we get the timeline ID from shared memory. XLogCtl->ThisTimeLineID can be used in most cases since recovery is known to have completed by the time those functions are called. In xlog_redo(), we can use XLogCtl->replayEndTLI. - XLogFileClose() needs to know the TLI of the open logfile. Do that with a new global variable openLogTLI. While someone could argue that this is just trading one global variable for another, the new one has a far more narrow purposes and is referenced in just a few places. - read_backup_label() now returns the TLI that it obtains by parsing the backup_label file. Previously, ReadRecord() could be called to parse the checkpoint record without ThisTimeLineID having been initialized. Now, the timeline is passed down, and I didn't want to pass an uninitialized variable; this change lets us avoid that. The old coding didn't seem to have any practical consequences that we need to worry about, but this is cleaner. - In BootstrapXLOG(), it's just a constant. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05Remove all use of ThisTimeLineID global variable outside of xlog.cRobert Haas
All such code deals with this global variable in one of three ways. Sometimes the same functions use it in more than one of these ways at the same time. First, sometimes it's an implicit argument to one or more functions being called in xlog.c or elsewhere, and must be set to the appropriate value before calling those functions lest they misbehave. In those cases, it is now passed as an explicit argument instead. Second, sometimes it's used to obtain the current timeline after the end of recovery, i.e. the timeline to which WAL is being written and flushed. Such code now calls GetWALInsertionTimeLine() or relies on the new out parameter added to GetFlushRecPtr(). Third, sometimes it's used during recovery to store the current replay timeline. That can change, so such code must generally update the value before each use. It can still do that, but must now use a local variable instead. The net effect of these changes is to reduce by a fair amount the amount of code that is directly accessing this global variable. That's good, because history has shown that we don't always think clearly about which timeline ID it's supposed to contain at any given point in time, or indeed, whether it has been or needs to be initialized at any given point in the code. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-02Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section.Amit Kapila
We don't modify any shared state in this function which could cause problems for any concurrent session. This will make it look similar to the other updates for the same structure (TransactionState) which avoids confusion for future readers of code. Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
2021-11-02Replace XLOG_INCLUDE_XID flag with a more localized flag.Amit Kapila
Commit 0bead9af484c introduced XLOG_INCLUDE_XID flag to indicate that the WAL record contains subXID-to-topXID association. It uses that flag later to mark in CurrentTransactionState that top-xid is logged so that we should not try to log it again with the next WAL record in the current subtransaction. However, we can use a localized variable to pass that information. In passing, change the related function and variable names to make them consistent with what the code is actually doing. Author: Dilip Kumar Reviewed-by: Alvaro Herrera, Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
2021-10-25Initialize variable to placate compiler.Robert Haas
Per Nathan Bossart. Discussion: http://postgr.es/m/FECEE7FC-CB74-45A9-BB24-89FEE52A9585@amazon.com
2021-10-25Report progress of startup operations that take a long time.Robert Haas
Users sometimes get concerned whe they start the server and it emits a few messages and then doesn't emit any more messages for a long time. Generally, what's happening is either that the system is taking a long time to apply WAL, or it's taking a long time to reset unlogged relations, or it's taking a long time to fsync the data directory, but it's not easy to tell which is the case. To fix that, add a new 'log_startup_progress_interval' setting, by default 10s. When an operation that is known to be potentially long-running takes more than this amount of time, we'll log a status update each time this interval elapses. To avoid undesirable log chatter, don't log anything about WAL replay when in standby mode. Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
2021-10-25StartupXLOG: Don't repeatedly disable/enable local xlog insertion.Robert Haas
All the code that runs in the startup process to write WAL records before that's allowed generally is now consecutive, so there's no reason to shut the facility to write WAL locally off and then turn it on again three times in a row. Unfortunately, this requires a slight kludge in the checkpointer, which needs to separately enable writing WAL in order to write the checkpoint record. Because that code might run in the same process as StartupXLOG() if we are in single-user mode, we must save/restore the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be able to eliminate this wart in further refactoring, but it's not too bad anyway. Amul Sul, with modifications by me. Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
2021-10-25StartupXLOG: Call CleanupAfterArchiveRecovery after XLogReportParameters.Robert Haas
This does a better job grouping related operations together, since all of the WAL records that we need to write prior to allowing WAL writes generally and written by a single uninterrupted stretch of code. Since CleanupAfterArchiveRecovery() just (1) runs recovery_end_command, (2) removes non-parent xlog files, and (3) archives any final partial segment, this should be safe, because all of those things are pretty much unrelated to the WAL record written by XLogReportParameters(). Amul Sul, per a suggestion from me Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
2021-10-14Postpone some end-of-recovery operations related to allowing WAL.Robert Haas
CreateOverwriteContrecordRecord(), UpdateFullPageWrites(), PerformRecoveryXLogAction(), and CleanupAfterArchiveRecovery() are moved somewhat later in StartupXLOG(). This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. To do that, it's necessary to separate code that has to do with allowing WAL writes from other things that need to happen simply because recovery is ending, such as initializing shared memory data structures that depend on information that might not be accurate before redo is complete. This commit does not achieve that goal, but it is a step in that direction. For example, there are a few different bits of code that write things into WAL once we have finished recovery, and with this change, those bits of code are closer to each other than previously, with fewer unrelated bits of code interspersed. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
2021-10-13Refactor some end-of-recovery code out of StartupXLOG().Robert Haas
Create a new function PerformRecoveryXLogAction() and move the code which either writes an end-of-recovery record or requests a checkpoint there. Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. More refactoring of this file is planned, but this commit is just straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
2021-10-06Make recovery report error message when invalid page header is found.Fujii Masao
Commit 0668719801 changed XLogPageRead() so that it validated the page header, if invalid page header was found reset the error message and retried reading the page, to fix the scenario where streaming standby got stuck at a continuation record. This change hid the error message about invalid page header, which would make it harder for users to investigate what the actual issue was found in WAL. To fix the issue, this commit makes XLogPageRead() report the error message when invalid page header is found. When not in standby mode, an invalid page header should cause recovery to end, not retry reading the page, so XLogPageRead() doesn't need to validate the page header for the retry. Instead, ReadPageInternal() should be responsible for the validation in that case. Therefore this commit changes XLogPageRead() so that if not in standby mode it doesn't validate the page header for the retry. Reported-by: Yugo Nagata Author: Yugo Nagata, Kyotaro Horiguchi Reviewed-by: Ranier Vilela, Fujii Masao Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
2021-10-04Fix snapshot builds during promotion of hot standby node with 2PCMichael Paquier
Some specific logic is done at the end of recovery when involving 2PC transactions: 1) Call RecoverPreparedTransactions(), to recover the state of 2PC transactions into memory (re-acquire locks, etc.). 2) ShutdownRecoveryTransactionEnvironment(), to move back to normal operations, mainly cleaning up recovery locks and KnownAssignedXids (including any 2PC transaction tracked previously). 3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is the tipping point for any process calling RecoveryInProgress() to check if the cluster is still in recovery or not. Any snapshot taken between steps 2) and 3) would be empty, causing any transaction relying on a snapshot at this point to potentially corrupt data as there could still be some 2PC transactions to track, with RecentXmin moving backwards on successive calls to GetSnapshotData() in the same transaction. As SharedRecoveryState is the point to take into account to know if it is safe to discard KnownAssignedXids, this commit moves step 2) after step 3), so as we can never finish with empty snapshots. This exists since the introduction of hot standby, so backpatch all the way down. The window with incorrect snapshots is extremely small, but I have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm member fairywren. Thomas Munro also found it independently. Special thanks to Andres Freund for taking the time to analyze this issue. Reported-by: Thomas Munro, Michael Paquier Analyzed-by: Andres Freund Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de Backpatch-through: 9.6
2021-10-01Remove unstable, unnecessary test; fix typoAlvaro Herrera
Commit ff9f111bce24 added some test code that's unportable and doesn't add meaningful coverage. Remove it rather than try and get it to work everywhere. While at it, fix a typo in a log message added by the aforementioned commit. Backpatch to 14. Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us
2021-09-29Fix WAL replay in presence of an incomplete recordAlvaro Herrera
Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
2021-09-04Revert "Avoid creating archive status ".ready" files too early"Alvaro Herrera
This reverts commit 515e3d84a0b5 and equivalent commits in back branches. This solution to the problem has a number of problems, so we'll try again with a different approach. Per note from Andres Freund Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de