summaryrefslogtreecommitdiff
path: root/doc/src/sgml/ref/pg_validatebackup.sgml
diff options
context:
space:
mode:
authorRobert Haas <rhaas@postgresql.org>2020-04-03 14:59:47 -0400
committerRobert Haas <rhaas@postgresql.org>2020-04-03 15:05:59 -0400
commit0d8c9c1210c44b36ec2efcb223a1dfbe897a3661 (patch)
treeaf5225aa493720c40a8d6142f043dde444a131af /doc/src/sgml/ref/pg_validatebackup.sgml
parentce77abe63cfc85fb0bc236deb2cc34ae35cb5324 (diff)
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
Diffstat (limited to 'doc/src/sgml/ref/pg_validatebackup.sgml')
-rw-r--r--doc/src/sgml/ref/pg_validatebackup.sgml291
1 files changed, 291 insertions, 0 deletions
diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml
new file mode 100644
index 00000000000..19888dc1966
--- /dev/null
+++ b/doc/src/sgml/ref/pg_validatebackup.sgml
@@ -0,0 +1,291 @@
+<!--
+doc/src/sgml/ref/pg_validatebackup.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgvalidatebackup">
+ <indexterm zone="app-pgvalidatebackup">
+ <primary>pg_validatebackup</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>pg_validatebackup</refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_validatebackup</refname>
+ <refpurpose>verify the integrity of a base backup of a
+ <productname>PostgreSQL</productname> cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_validatebackup</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>
+ Description
+ </title>
+ <para>
+ <application>pg_validatebackup</application> is used to check the
+ integrity of a database cluster backup taken using
+ <command>pg_basebackup</command> against a
+ <literal>backup_manifest</literal> generated by the server at the time
+ of the backup. The backup must be stored in the "plain"
+ format; a "tar" format backup can be checked after extracting it.
+ </para>
+
+ <para>
+ It is important to note that that the validation which is performed by
+ <application>pg_validatebackup</application> does not and can not include
+ every check which will be performed by a running server when attempting
+ to make use of the backup. Even if you use this tool, you should still
+ perform test restores and verify that the resulting databases work as
+ expected and that they appear to contain the correct data. However,
+ <application>pg_validatebackup</application> can detect many problems
+ that commonly occur due to storage problems or user error.
+ </para>
+
+ <para>
+ Backup verification proceeds in four stages. First,
+ <literal>pg_validatebackup</literal> reads the
+ <literal>backup_manifest</literal> file. If that file
+ does not exist, cannot be read, is malformed, or fails verification
+ against its own internal checksum, <literal>pg_validatebackup</literal>
+ will terminate with a fatal error.
+ </para>
+
+ <para>
+ Second, <literal>pg_validatebackup</literal> will attempt to verify that
+ the data files currently stored on disk are exactly the same as the data
+ files which the server intended to send, with some exceptions that are
+ described below. Extra and missing files will be detected, with a few
+ exceptions. This step will ignore the presence or absence of, or any
+ modifications to, <literal>postgresql.auto.conf</literal>,
+ <literal>standby.signal</literal>, and <literal>recovery.signal</literal>,
+ because it is expected that these files may have been created or modified
+ as part of the process of taking the backup. It also won't complain about
+ a <literal>backup_manifest</literal> file in the target directory or
+ about anything inside <literal>pg_wal</literal>, even though these
+ files won't be listed in the backup manifest. Only files are checked;
+ the presence or absence or directories is not verified, except
+ indirectly: if a directory is missing, any files it should have contained
+ will necessarily also be missing.
+ </para>
+
+ <para>
+ Next, <literal>pg_validatebackup</literal> will checksum all the files,
+ compare the checksums against the values in the manifest, and emit errors
+ for any files for which the computed checksum does not match the
+ checksum stored in the manifest. This step is not performed for any files
+ which produced errors in the previous step, since they are already known
+ to have problems. Also, files which were ignored in the previous step are
+ also ignored in this step.
+ </para>
+
+ <para>
+ Finally, <literal>pg_validatebackup</literal> will use the manifest to
+ verify that the write-ahead log records which will be needed to recover
+ the backup are present and that they can be read and parsed. The
+ <literal>backup_manifest</literal> contains information about which
+ write-ahead log records will be needed, and
+ <literal>pg_validatebackup</literal> will use that information to
+ invoke <literal>pg_waldump</literal> to parse those write-ahed log
+ records. The <literal>--quiet</literal> flag will be used, so that
+ <literal>pg_waldump</literal> will only report errors, without producing
+ any other output. While this level of verification is sufficient to
+ detect obvious problems such as a missing file or one whose internal
+ checksums do not match, they aren't extensive enough to detect every
+ possible problem that might occur when attempting to recover. For
+ instance, a server bug that produces write-ahead log records that have
+ the correct checksums but specify nonsensical actions can't be detected
+ by this method.
+ </para>
+
+ <para>
+ Note that if extra WAL files which are not required to recover the backup
+ are present, they will not be checked by this tool, although
+ a separate invocation of <literal>pg_waldump</literal> could be used for
+ that purpose. Also note that WAL verification is version-specific: you
+ must use the version of <literal>pg_validatebackup</literal>, and thus of
+ <literal>pg_waldump</literal>, which pertains to the backup being checked.
+ In contrast, the data file integrity checks should work with any version
+ of the server that generates a <literal>backup_manifest</literal> file.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ The following command-line options control the behavior.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--exit-on-error</option></term>
+ <listitem>
+ <para>
+ Exit as soon as a problem with the backup is detected. If this option
+ is not specified, <literal>pg_basebackup</literal> will continue
+ checking the backup even after a problem has been detected, and will
+ report all problems detected as errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i <replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--ignore=<replaceable class="parameter">path</replaceable></option></term>
+ <listitem>
+ <para>
+ Ignore the specified file or directory, which should be expressed
+ as a relative pathname, when comparing the list of data files
+ actually present in the backup to those listed in the
+ <literal>backup_manifest</literal> file. If a directory is
+ specified, this option affects the entire subtree rooted at that
+ location. Complaints about extra files, missing files, file size
+ differences, or checksum mismatches will be suppressed if the
+ relative pathname matches the specified pathname. This option
+ can be specified multiple times.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-m <replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--manifest-path=<replaceable class="parameter">path</replaceable></option></term>
+ <listitem>
+ <para>
+ Use the manifest file at the specified path, rather than one located
+ in the root of the backup directory.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n</option></term>
+ <term><option>--no-parse-wal</option></term>
+ <listitem>
+ <para>
+ Don't attempt to parse write-ahead log data that will be needed
+ to recover from this backup.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Don't print anything when a backup is successfully validated.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--skip-checksums</option></term>
+ <listitem>
+ <para>
+ Do not validate data file checksums. The presence or absence of
+ files and the sizes of those files will still be checked. This is
+ much faster, because the files themselves do not need to be read.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w <replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <listitem>
+ <para>
+ Try to parse WAL files stored in the specified directory, rather than
+ in <literal>pg_wal</literal>. This may be useful if the backup is
+ stored in a separate location from the WAL archive.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ Other options are also available:
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_validatebackup</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_validatebackup</application> command
+ line arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ To create a base backup of the server at <literal>mydbserver</literal> and
+ validate the integrity of the backup:
+<screen>
+<prompt>$</prompt> <userinput>pg_basebackup -h mydbserver -D /usr/local/pgsql/data</userinput>
+<prompt>$</prompt> <userinput>pg_validatebackup /usr/local/pgsql/data</userinput>
+</screen>
+ </para>
+
+ <para>
+ To create a base backup of the server at <literal>mydbserver</literal>, move
+ the manifest somewhere outside the backup directory, and validate the
+ backup:
+<screen>
+<prompt>$</prompt> <userinput>pg_basebackup -h mydbserver -D /usr/local/pgsql/backup1234</userinput>
+<prompt>$</prompt> <userinput>mv /usr/local/pgsql/backup1234/backup_manifest /my/secure/location/backup_manifest.1234</userinput>
+<prompt>$</prompt> <userinput>pg_validatebackup -m /my/secure/location/backup_manifest.1234 /usr/local/pgsql/backup1234</userinput>
+</screen>
+ </para>
+
+ <para>
+ To validate a backup while ignoring a file that was added manually to the
+ backup directory, and also skipping checksum verification:
+<screen>
+<prompt>$</prompt> <userinput>pg_basebackup -h mydbserver -D /usr/local/pgsql/data</userinput>
+<prompt>$</prompt> <userinput>edit /usr/local/pgsql/data/note.to.self</userinput>
+<prompt>$</prompt> <userinput>pg_validatebackup --ignore=note.to.self --skip-checksums /usr/local/pgsql/data</userinput>
+</screen>
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="app-pgbasebackup"/></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>