From dc212340058b4e7ecfc5a7a81ec50e7a207bf288 Mon Sep 17 00:00:00 2001 From: Robert Haas Date: Wed, 20 Dec 2023 09:49:12 -0500 Subject: Add support for incremental backup. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com --- doc/src/sgml/backup.sgml | 89 +++++++++++- doc/src/sgml/config.sgml | 2 - doc/src/sgml/protocol.sgml | 24 ++++ doc/src/sgml/ref/allfiles.sgml | 1 + doc/src/sgml/ref/pg_basebackup.sgml | 37 ++++- doc/src/sgml/ref/pg_combinebackup.sgml | 240 +++++++++++++++++++++++++++++++++ doc/src/sgml/reference.sgml | 1 + 7 files changed, 383 insertions(+), 11 deletions(-) create mode 100644 doc/src/sgml/ref/pg_combinebackup.sgml (limited to 'doc/src') diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 8cb24d6ae54..b3468eea3cb 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -857,12 +857,79 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 + + Making an Incremental Backup + + + You can use to take an incremental + backup by specifying the --incremental option. You must + supply, as an argument to --incremental, the backup + manifest to an earlier backup from the same server. In the resulting + backup, non-relation files will be included in their entirety, but some + relation files may be replaced by smaller incremental files which contain + only the blocks which have been changed since the earlier backup and enough + metadata to reconstruct the current version of the file. + + + + To figure out which blocks need to be backed up, the server uses WAL + summaries, which are stored in the data directory, inside the directory + pg_wal/summaries. If the required summary files are not + present, an attempt to take an incremental backup will fail. The summaries + present in this directory must cover all LSNs from the start LSN of the + prior backup to the start LSN of the current backup. Since the server looks + for WAL summaries just after establishing the start LSN of the current + backup, the necessary summary files probably won't be instantly present + on disk, but the server will wait for any missing files to show up. + This also helps if the WAL summarization process has fallen behind. + However, if the necessary files have already been removed, or if the WAL + summarizer doesn't catch up quickly enough, the incremental backup will + fail. + + + + When restoring an incremental backup, it will be necessary to have not + only the incremental backup itself but also all earlier backups that + are required to supply the blocks omitted from the incremental backup. + See for further information about + this requirement. + + + + Note that all of the requirements for making use of a full backup also + apply to an incremental backup. For instance, you still need all of the + WAL segment files generated during and after the file system backup, and + any relevant WAL history files. And you still need to create a + recovery.signal (or standby.signal) + and perform recovery, as described in + . The requirement to have earlier + backups available at restore time and to use + pg_combinebackup is an additional requirement on top of + everything else. Keep in mind that PostgreSQL + has no built-in mechanism to figure out which backups are still needed as + a basis for restoring later incremental backups. You must keep track of + the relationships between your full and incremental backups on your own, + and be certain not to remove earlier backups if they might be needed when + restoring later incremental backups. + + + + Incremental backups typically only make sense for relatively large + databases where a significant portion of the data does not change, or only + changes slowly. For a small database, it's simpler to ignore the existence + of incremental backups and simply take full backups, which are simpler + to manage. For a large database all of which is heavily modified, + incremental backups won't be much smaller than full backups. + + + Making a Base Backup Using the Low Level API - The procedure for making a base backup using the low level - APIs contains a few more steps than - the method, but is relatively + Instead of taking a full or incremental base backup using + , you can take a base backup using the + low-level API. This procedure contains a few more steps than + the pg_basebackup method, but is relatively simple. It is very important that these steps are executed in sequence, and that the success of a step is verified before proceeding to the next step. @@ -1118,7 +1185,8 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); - Restore the database files from your file system backup. Be sure that they + If you're restoring a full backup, you can restore the database files + directly into the target directories. Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, @@ -1126,6 +1194,19 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); were correctly restored. + + + If you're restoring an incremental backup, you'll need to restore the + incremental backup and all earlier backups upon which it directly or + indirectly depends to the machine where you are performing the restore. + These backups will need to be placed in separate directories, not the + target directories where you want the running server to end up. + Once this is done, use to pull + data from the full backup and all of the subsequent incremental backups + and write out a synthetic full backup to the target directories. As above, + verify that permissions and tablespace links are correct. + + Remove any files present in pg_wal/; these came from the diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ee985850275..b5624ca8847 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4153,13 +4153,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows WAL Summarization - diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index af3f016f746..9a66918171a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2599,6 +2599,19 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" + + + UPLOAD_MANIFEST + UPLOAD_MANIFEST + + + + Uploads a backup manifest in preparation for taking an incremental + backup. + + + + BASE_BACKUP [ ( option [, ...] ) ] BASE_BACKUP @@ -2838,6 +2851,17 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" + + + INCREMENTAL + + + Requests an incremental backup. The + UPLOAD_MANIFEST command must be executed + before running a base backup with this option. + + + diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 54b5f22d6ec..fda4690eab5 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -202,6 +202,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 0b87fd2d4d6..7c183a5cfd2 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -38,11 +38,25 @@ PostgreSQL documentation - pg_basebackup makes an exact copy of the database - cluster's files, while making sure the server is put into and - out of backup mode automatically. Backups are always taken of the entire - database cluster; it is not possible to back up individual databases or - database objects. For selective backups, another tool such as + pg_basebackup can take a full or incremental + base backup of the database. When used to take a full backup, it makes an + exact copy of the database cluster's files. When used to take an incremental + backup, some files that would have been part of a full backup may be + replaced with incremental versions of the same files, containing only those + blocks that have been modified since the reference backup. An incremental + backup cannot be used directly; instead, + must first + be used to combine it with the previous backups upon which it depends. + See for more information + about incremental backups, and + for steps to recover from a backup. + + + + In any mode, pg_basebackup makes sure the server + is put into and out of backup mode automatically. Backups are always taken of + the entire database cluster; it is not possible to back up individual + databases or database objects. For selective backups, another tool such as must be used. @@ -197,6 +211,19 @@ PostgreSQL documentation + + + + + + Performs an incremental + backup. The backup manifest for the reference + backup must be provided, and will be uploaded to the server, which will + respond by sending the requested incremental backup. + + + + diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml new file mode 100644 index 00000000000..e1729671a5d --- /dev/null +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -0,0 +1,240 @@ + + + + + pg_combinebackup + + + + pg_combinebackup + 1 + Application + + + + pg_combinebackup + reconstruct a full backup from an incremental backup and dependent backups + + + + + pg_combinebackup + option + backup_directory + + + + + Description + + pg_combinebackup is used to reconstruct a + synthetic full backup from an + incremental backup and the + earlier backups upon which it depends. + + + + Specify all of the required backups on the command line from oldest to newest. + That is, the first backup directory should be the path to the full backup, and + the last should be the path to the final incremental backup + that you wish to restore. The reconstructed backup will be written to the + output directory specified by the option. + + + + Although pg_combinebackup will attempt to verify + that the backups you specify form a legal backup chain from which a correct + full backup can be reconstructed, it is not designed to help you keep track + of which backups depend on which other backups. If you remove the one or + more of the previous backups upon which your incremental + backup relies, you will not be able to restore it. + + + + Since the output of pg_combinebackup is a + synthetic full backup, it can be used as an input to a future invocation of + pg_combinebackup. The synthetic full backup would + be specified on the command line in lieu of the chain of backups from which + it was reconstructed. + + + + + Options + + + + + + + + + Print lots of debug logging output on stderr. + + + + + + + + + + The / option instructs + pg_cominebackup to figure out what would be done + without actually creating the target directory or any output files. + It is particularly useful in comination with . + + + + + + + + + + By default, pg_combinebackup will wait for all files + to be written safely to disk. This option causes + pg_combinebackup to return without waiting, which is + faster, but means that a subsequent operating system crash can leave + the output backup corrupt. Generally, this option is useful for testing + but should not be used when creating a production installation. + + + + + + + + + + Specifies the output directory to which the synthetic full backup + should be written. Currently, this argument is required. + + + + + + + + + + Relocates the tablespace in directory olddir + to newdir during the backup. + olddir is the absolute path of the tablespace + as it exists in the first backup specified on the command line, + and newdir is the absolute path to use for the + tablespace in the reconstructed backup. If either path needs to contain + an equal sign (=), precede that with a backslash. + This option can be specified multiple times for multiple tablespaces. + + + + + + + + + Like , + pg_combinebackup writes a backup manifest + in the output directory. This option specifies the checksum algorithm + that should be applied to each file included in the backup manifest. + Currently, the available algorithms are NONE, + CRC32C, SHA224, + SHA256, SHA384, + and SHA512. The default is CRC32C. + + + + + + + + + Disables generation of a backup manifest. If this option is not + specified, a backup manifest for the reconstructed backup will be + written to the output directory. + + + + + + + + + When set to fsync, which is the default, + pg_combinebackup will recursively open and synchronize + all files in the backup directory. When the plain format is used, the + search for files will follow symbolic links for the WAL directory and + each configured tablespace. + + + On Linux, syncfs may be used instead to ask the + operating system to synchronize the whole file system that contains the + backup directory. When the plain format is used, + pg_combinebackup will also synchronize the file systems + that contain the WAL files and each tablespace. See + for more information about using + syncfs(). + + + This option has no effect when is used. + + + + + + + + + + Prints the pg_combinebackup version and + exits. + + + + + + + + + + Shows help about pg_combinebackup command + line arguments, and exits. + + + + + + + + + + + Environment + + + This utility, like most other PostgreSQL utilities, + uses the environment variables supported by libpq + (see ). + + + + The environment variable PG_COLOR specifies whether to use + color in diagnostic messages. Possible values are + always, auto and + never. + + + + + See Also + + + + + + + diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index e11b4b61307..a07d2b5e01e 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -250,6 +250,7 @@ &pgamcheck; &pgBasebackup; &pgbench; + &pgCombinebackup; &pgConfig; &pgDump; &pgDumpall; -- cgit v1.2.3