diff options
| author | Robert Haas <rhaas@postgresql.org> | 2023-12-20 09:49:12 -0500 |
|---|---|---|
| committer | Robert Haas <rhaas@postgresql.org> | 2023-12-20 09:49:12 -0500 |
| commit | dc212340058b4e7ecfc5a7a81ec50e7a207bf288 (patch) | |
| tree | 79ffec15f6a8d9fce1333b99dd0b587e2459d38f /doc/src | |
| parent | 174c480508ac25568561443e6d4a82d5c1103487 (diff) | |
Add support for incremental backup.
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/backup.sgml | 89 | ||||
| -rw-r--r-- | doc/src/sgml/config.sgml | 2 | ||||
| -rw-r--r-- | doc/src/sgml/protocol.sgml | 24 | ||||
| -rw-r--r-- | doc/src/sgml/ref/allfiles.sgml | 1 | ||||
| -rw-r--r-- | doc/src/sgml/ref/pg_basebackup.sgml | 37 | ||||
| -rw-r--r-- | doc/src/sgml/ref/pg_combinebackup.sgml | 240 | ||||
| -rw-r--r-- | doc/src/sgml/reference.sgml | 1 |
7 files changed, 383 insertions, 11 deletions
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 8cb24d6ae54..b3468eea3cb 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -857,12 +857,79 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/0 </para> </sect2> + <sect2 id="backup-incremental-backup"> + <title>Making an Incremental Backup</title> + + <para> + You can use <xref linkend="app-pgbasebackup"/> to take an incremental + backup by specifying the <literal>--incremental</literal> option. You must + supply, as an argument to <literal>--incremental</literal>, the backup + manifest to an earlier backup from the same server. In the resulting + backup, non-relation files will be included in their entirety, but some + relation files may be replaced by smaller incremental files which contain + only the blocks which have been changed since the earlier backup and enough + metadata to reconstruct the current version of the file. + </para> + + <para> + To figure out which blocks need to be backed up, the server uses WAL + summaries, which are stored in the data directory, inside the directory + <literal>pg_wal/summaries</literal>. If the required summary files are not + present, an attempt to take an incremental backup will fail. The summaries + present in this directory must cover all LSNs from the start LSN of the + prior backup to the start LSN of the current backup. Since the server looks + for WAL summaries just after establishing the start LSN of the current + backup, the necessary summary files probably won't be instantly present + on disk, but the server will wait for any missing files to show up. + This also helps if the WAL summarization process has fallen behind. + However, if the necessary files have already been removed, or if the WAL + summarizer doesn't catch up quickly enough, the incremental backup will + fail. + </para> + + <para> + When restoring an incremental backup, it will be necessary to have not + only the incremental backup itself but also all earlier backups that + are required to supply the blocks omitted from the incremental backup. + See <xref linkend="app-pgcombinebackup"/> for further information about + this requirement. + </para> + + <para> + Note that all of the requirements for making use of a full backup also + apply to an incremental backup. For instance, you still need all of the + WAL segment files generated during and after the file system backup, and + any relevant WAL history files. And you still need to create a + <literal>recovery.signal</literal> (or <literal>standby.signal</literal>) + and perform recovery, as described in + <xref linkend="backup-pitr-recovery" />. The requirement to have earlier + backups available at restore time and to use + <literal>pg_combinebackup</literal> is an additional requirement on top of + everything else. Keep in mind that <application>PostgreSQL</application> + has no built-in mechanism to figure out which backups are still needed as + a basis for restoring later incremental backups. You must keep track of + the relationships between your full and incremental backups on your own, + and be certain not to remove earlier backups if they might be needed when + restoring later incremental backups. + </para> + + <para> + Incremental backups typically only make sense for relatively large + databases where a significant portion of the data does not change, or only + changes slowly. For a small database, it's simpler to ignore the existence + of incremental backups and simply take full backups, which are simpler + to manage. For a large database all of which is heavily modified, + incremental backups won't be much smaller than full backups. + </para> + </sect2> + <sect2 id="backup-lowlevel-base-backup"> <title>Making a Base Backup Using the Low Level API</title> <para> - The procedure for making a base backup using the low level - APIs contains a few more steps than - the <xref linkend="app-pgbasebackup"/> method, but is relatively + Instead of taking a full or incremental base backup using + <xref linkend="app-pgbasebackup"/>, you can take a base backup using the + low-level API. This procedure contains a few more steps than + the <application>pg_basebackup</application> method, but is relatively simple. It is very important that these steps are executed in sequence, and that the success of a step is verified before proceeding to the next step. @@ -1118,7 +1185,8 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </listitem> <listitem> <para> - Restore the database files from your file system backup. Be sure that they + If you're restoring a full backup, you can restore the database files + directly into the target directories. Be sure that they are restored with the right ownership (the database system user, not <literal>root</literal>!) and with the right permissions. If you are using tablespaces, @@ -1128,6 +1196,19 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </listitem> <listitem> <para> + If you're restoring an incremental backup, you'll need to restore the + incremental backup and all earlier backups upon which it directly or + indirectly depends to the machine where you are performing the restore. + These backups will need to be placed in separate directories, not the + target directories where you want the running server to end up. + Once this is done, use <xref linkend="app-pgcombinebackup"/> to pull + data from the full backup and all of the subsequent incremental backups + and write out a synthetic full backup to the target directories. As above, + verify that permissions and tablespace links are correct. + </para> + </listitem> + <listitem> + <para> Remove any files present in <filename>pg_wal/</filename>; these came from the file system backup and are therefore probably obsolete rather than current. If you didn't archive <filename>pg_wal/</filename> at all, then recreate diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ee985850275..b5624ca8847 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4153,13 +4153,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows <sect2 id="runtime-config-wal-summarization"> <title>WAL Summarization</title> - <!-- <para> These settings control WAL summarization, a feature which must be enabled in order to perform an <link linkend="backup-incremental-backup">incremental backup</link>. </para> - --> <variablelist> <varlistentry id="guc-summarize-wal" xreflabel="summarize_wal"> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index af3f016f746..9a66918171a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2599,6 +2599,19 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" </listitem> </varlistentry> + <varlistentry id="protocol-replication-upload-manifest"> + <term> + <literal>UPLOAD_MANIFEST</literal> + <indexterm><primary>UPLOAD_MANIFEST</primary></indexterm> + </term> + <listitem> + <para> + Uploads a backup manifest in preparation for taking an incremental + backup. + </para> + </listitem> + </varlistentry> + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> <term><literal>BASE_BACKUP</literal> [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] <indexterm><primary>BASE_BACKUP</primary></indexterm> @@ -2838,6 +2851,17 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" </para> </listitem> </varlistentry> + + <varlistentry> + <term><literal>INCREMENTAL</literal></term> + <listitem> + <para> + Requests an incremental backup. The + <literal>UPLOAD_MANIFEST</literal> command must be executed + before running a base backup with this option. + </para> + </listitem> + </varlistentry> </variablelist> </para> diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 54b5f22d6ec..fda4690eab5 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -202,6 +202,7 @@ Complete list of usable sgml source files in this directory. <!ENTITY pgBasebackup SYSTEM "pg_basebackup.sgml"> <!ENTITY pgbench SYSTEM "pgbench.sgml"> <!ENTITY pgChecksums SYSTEM "pg_checksums.sgml"> +<!ENTITY pgCombinebackup SYSTEM "pg_combinebackup.sgml"> <!ENTITY pgConfig SYSTEM "pg_config-ref.sgml"> <!ENTITY pgControldata SYSTEM "pg_controldata.sgml"> <!ENTITY pgCtl SYSTEM "pg_ctl-ref.sgml"> diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 0b87fd2d4d6..7c183a5cfd2 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -38,11 +38,25 @@ PostgreSQL documentation </para> <para> - <application>pg_basebackup</application> makes an exact copy of the database - cluster's files, while making sure the server is put into and - out of backup mode automatically. Backups are always taken of the entire - database cluster; it is not possible to back up individual databases or - database objects. For selective backups, another tool such as + <application>pg_basebackup</application> can take a full or incremental + base backup of the database. When used to take a full backup, it makes an + exact copy of the database cluster's files. When used to take an incremental + backup, some files that would have been part of a full backup may be + replaced with incremental versions of the same files, containing only those + blocks that have been modified since the reference backup. An incremental + backup cannot be used directly; instead, + <xref linkend="app-pgcombinebackup"/> must first + be used to combine it with the previous backups upon which it depends. + See <xref linkend="backup-incremental-backup" /> for more information + about incremental backups, and <xref linkend="backup-pitr-recovery" /> + for steps to recover from a backup. + </para> + + <para> + In any mode, <application>pg_basebackup</application> makes sure the server + is put into and out of backup mode automatically. Backups are always taken of + the entire database cluster; it is not possible to back up individual + databases or database objects. For selective backups, another tool such as <xref linkend="app-pgdump"/> must be used. </para> @@ -198,6 +212,19 @@ PostgreSQL documentation </varlistentry> <varlistentry> + <term><option>-i <replaceable class="parameter">old_manifest_file</replaceable></option></term> + <term><option>--incremental=<replaceable class="parameter">old_meanifest_file</replaceable></option></term> + <listitem> + <para> + Performs an <link linkend="backup-incremental-backup">incremental + backup</link>. The backup manifest for the reference + backup must be provided, and will be uploaded to the server, which will + respond by sending the requested incremental backup. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-R</option></term> <term><option>--write-recovery-conf</option></term> <listitem> diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml new file mode 100644 index 00000000000..e1729671a5d --- /dev/null +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -0,0 +1,240 @@ +<!-- +doc/src/sgml/ref/pg_combinebackup.sgml +PostgreSQL documentation +--> + +<refentry id="app-pgcombinebackup"> + <indexterm zone="app-pgcombinebackup"> + <primary>pg_combinebackup</primary> + </indexterm> + + <refmeta> + <refentrytitle><application>pg_combinebackup</application></refentrytitle> + <manvolnum>1</manvolnum> + <refmiscinfo>Application</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>pg_combinebackup</refname> + <refpurpose>reconstruct a full backup from an incremental backup and dependent backups</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <cmdsynopsis> + <command>pg_combinebackup</command> + <arg rep="repeat"><replaceable>option</replaceable></arg> + <arg rep="repeat"><replaceable>backup_directory</replaceable></arg> + </cmdsynopsis> + </refsynopsisdiv> + + <refsect1> + <title>Description</title> + <para> + <application>pg_combinebackup</application> is used to reconstruct a + synthetic full backup from an + <link linkend="backup-incremental-backup">incremental backup</link> and the + earlier backups upon which it depends. + </para> + + <para> + Specify all of the required backups on the command line from oldest to newest. + That is, the first backup directory should be the path to the full backup, and + the last should be the path to the final incremental backup + that you wish to restore. The reconstructed backup will be written to the + output directory specified by the <option>-o</option> option. + </para> + + <para> + Although <application>pg_combinebackup</application> will attempt to verify + that the backups you specify form a legal backup chain from which a correct + full backup can be reconstructed, it is not designed to help you keep track + of which backups depend on which other backups. If you remove the one or + more of the previous backups upon which your incremental + backup relies, you will not be able to restore it. + </para> + + <para> + Since the output of <application>pg_combinebackup</application> is a + synthetic full backup, it can be used as an input to a future invocation of + <application>pg_combinebackup</application>. The synthetic full backup would + be specified on the command line in lieu of the chain of backups from which + it was reconstructed. + </para> + </refsect1> + + <refsect1> + <title>Options</title> + + <para> + <variablelist> + <varlistentry> + <term><option>-d</option></term> + <term><option>--debug</option></term> + <listitem> + <para> + Print lots of debug logging output on <filename>stderr</filename>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-n</option></term> + <term><option>--dry-run</option></term> + <listitem> + <para> + The <option>-n</option>/<option>--dry-run</option> option instructs + <command>pg_cominebackup</command> to figure out what would be done + without actually creating the target directory or any output files. + It is particularly useful in comination with <option>--debug</option>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-N</option></term> + <term><option>--no-sync</option></term> + <listitem> + <para> + By default, <command>pg_combinebackup</command> will wait for all files + to be written safely to disk. This option causes + <command>pg_combinebackup</command> to return without waiting, which is + faster, but means that a subsequent operating system crash can leave + the output backup corrupt. Generally, this option is useful for testing + but should not be used when creating a production installation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-o <replaceable class="parameter">outputdir</replaceable></option></term> + <term><option>--output=<replaceable class="parameter">outputdir</replaceable></option></term> + <listitem> + <para> + Specifies the output directory to which the synthetic full backup + should be written. Currently, this argument is required. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-T <replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term> + <term><option>--tablespace-mapping=<replaceable class="parameter">olddir</replaceable>=<replaceable class="parameter">newdir</replaceable></option></term> + <listitem> + <para> + Relocates the tablespace in directory <replaceable>olddir</replaceable> + to <replaceable>newdir</replaceable> during the backup. + <replaceable>olddir</replaceable> is the absolute path of the tablespace + as it exists in the first backup specified on the command line, + and <replaceable>newdir</replaceable> is the absolute path to use for the + tablespace in the reconstructed backup. If either path needs to contain + an equal sign (<literal>=</literal>), precede that with a backslash. + This option can be specified multiple times for multiple tablespaces. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term> + <listitem> + <para> + Like <xref linkend="app-pgbasebackup"/>, + <application>pg_combinebackup</application> writes a backup manifest + in the output directory. This option specifies the checksum algorithm + that should be applied to each file included in the backup manifest. + Currently, the available algorithms are <literal>NONE</literal>, + <literal>CRC32C</literal>, <literal>SHA224</literal>, + <literal>SHA256</literal>, <literal>SHA384</literal>, + and <literal>SHA512</literal>. The default is <literal>CRC32C</literal>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--no-manifest</option></term> + <listitem> + <para> + Disables generation of a backup manifest. If this option is not + specified, a backup manifest for the reconstructed backup will be + written to the output directory. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--sync-method=<replaceable class="parameter">method</replaceable></option></term> + <listitem> + <para> + When set to <literal>fsync</literal>, which is the default, + <command>pg_combinebackup</command> will recursively open and synchronize + all files in the backup directory. When the plain format is used, the + search for files will follow symbolic links for the WAL directory and + each configured tablespace. + </para> + <para> + On Linux, <literal>syncfs</literal> may be used instead to ask the + operating system to synchronize the whole file system that contains the + backup directory. When the plain format is used, + <command>pg_combinebackup</command> will also synchronize the file systems + that contain the WAL files and each tablespace. See + <xref linkend="syncfs"/> for more information about using + <function>syncfs()</function>. + </para> + <para> + This option has no effect when <option>--no-sync</option> is used. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-V</option></term> + <term><option>--version</option></term> + <listitem> + <para> + Prints the <application>pg_combinebackup</application> version and + exits. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-?</option></term> + <term><option>--help</option></term> + <listitem> + <para> + Shows help about <application>pg_combinebackup</application> command + line arguments, and exits. + </para> + </listitem> + </varlistentry> + + </variablelist> + </para> + + </refsect1> + + <refsect1> + <title>Environment</title> + + <para> + This utility, like most other <productname>PostgreSQL</productname> utilities, + uses the environment variables supported by <application>libpq</application> + (see <xref linkend="libpq-envars"/>). + </para> + + <para> + The environment variable <envar>PG_COLOR</envar> specifies whether to use + color in diagnostic messages. Possible values are + <literal>always</literal>, <literal>auto</literal> and + <literal>never</literal>. + </para> + </refsect1> + + <refsect1> + <title>See Also</title> + + <simplelist type="inline"> + <member><xref linkend="app-pgbasebackup"/></member> + </simplelist> + </refsect1> + +</refentry> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index e11b4b61307..a07d2b5e01e 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -250,6 +250,7 @@ &pgamcheck; &pgBasebackup; &pgbench; + &pgCombinebackup; &pgConfig; &pgDump; &pgDumpall; |
