diff options
| author | Peter Eisentraut <peter_e@gmx.net> | 2017-03-23 08:36:36 -0400 |
|---|---|---|
| committer | Peter Eisentraut <peter_e@gmx.net> | 2017-03-23 08:55:37 -0400 |
| commit | 7c4f52409a8c7d85ed169bbbc1f6092274d03920 (patch) | |
| tree | fa3dc592bb2855e5cc0a200f4c408b4c8d299be5 /doc/src | |
| parent | 707576b571f05ec5b89adb65964d55f3ccccbd1b (diff) | |
Logical replication support for initial data copy
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/catalogs.sgml | 78 | ||||
| -rw-r--r-- | doc/src/sgml/config.sgml | 25 | ||||
| -rw-r--r-- | doc/src/sgml/logical-replication.sgml | 55 | ||||
| -rw-r--r-- | doc/src/sgml/monitoring.sgml | 9 | ||||
| -rw-r--r-- | doc/src/sgml/protocol.sgml | 9 | ||||
| -rw-r--r-- | doc/src/sgml/ref/alter_subscription.sgml | 50 | ||||
| -rw-r--r-- | doc/src/sgml/ref/create_subscription.sgml | 38 | ||||
| -rw-r--r-- | doc/src/sgml/ref/pg_dump.sgml | 15 |
8 files changed, 249 insertions, 30 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index df0435c3f00..228ec780318 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -301,6 +301,11 @@ </row> <row> + <entry><link linkend="catalog-pg-subscription-rel"><structname>pg_subscription_rel</structname></link></entry> + <entry>relation state for subscriptions</entry> + </row> + + <row> <entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry> <entry>tablespaces within this database cluster</entry> </row> @@ -6418,6 +6423,79 @@ </table> </sect1> + <sect1 id="catalog-pg-subscription-rel"> + <title><structname>pg_subscription_rel</structname></title> + + <indexterm zone="catalog-pg-subscription-rel"> + <primary>pg_subscription_rel</primary> + </indexterm> + + <para> + The catalog <structname>pg_subscription_rel</structname> contains the + state for each replicated relation in each subscription. This is a + many-to-many mapping. + </para> + + <para> + This catalog only contains tables known to the subscription after running + either <command>CREATE SUBSCRIPTION</command> or + <command>ALTER SUBSCRIPTION ... REFRESH</command>. + </para> + + <table> + <title><structname>pg_subscription_rel</structname> Columns</title> + + <tgroup cols="4"> + <thead> + <row> + <entry>Name</entry> + <entry>Type</entry> + <entry>References</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry><structfield>srsubid</structfield></entry> + <entry><type>oid</type></entry> + <entry><literal><link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link>.oid</literal></entry> + <entry>Reference to subscription</entry> + </row> + + <row> + <entry><structfield>srrelid</structfield></entry> + <entry><type>oid</type></entry> + <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry> + <entry>Reference to relation</entry> + </row> + + <row> + <entry><structfield>srsubstate</structfield></entry> + <entry><type>char</type></entry> + <entry></entry> + <entry> + State code: + <literal>i</> = initialize, + <literal>d</> = data is being copied, + <literal>s</> = synchronized, + <literal>r</> = ready (normal replication) + </entry> + </row> + + <row> + <entry><structfield>srsublsn</structfield></entry> + <entry><type>pg_lsn</type></entry> + <entry></entry> + <entry> + End LSN for <literal>s</> and <literal>r</> states. + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect1> + <sect1 id="catalog-pg-tablespace"> <title><structname>pg_tablespace</structname></title> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b379b67b30a..2de3540def7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3449,6 +3449,31 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class=" </listitem> </varlistentry> + <varlistentry id="guc-max-sync-workers-per-subscription" xreflabel="max_sync_workers_per_subscription"> + <term><varname>max_sync_workers_per_subscription</varname> (<type>integer</type>) + <indexterm> + <primary><varname>max_sync_workers_per_subscription</> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Maximum number of synchronization workers per subscription. This + parameter controls the amount of paralelism of the initial data copy + during the subscription initialization or when new tables are added. + </para> + <para> + Currently, there can be only one synchronization worker per table. + </para> + <para> + The synchronization workers are taken from the pool defined by + <varname>max_logical_replication_workers</varname>. + </para> + <para> + The default value is 2. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 44cd78563d3..48db9cd08b7 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -24,9 +24,11 @@ </para> <para> - Logical replication sends changes on the publisher to the subscriber as - they occur in real-time. The subscriber applies the data in the same order - as the publisher so that transactional consistency is guaranteed for + Logical replication of a table typically starts with a taking a snapshot + of the data on the publisher database and copying that to the subscriber. + Once that is done, the changes on the publisher are sent to the subscriber + as they occur in real-time. The subscriber applies the data in the same + order as the publisher so that transactional consistency is guaranteed for publications within a single subscription. This method of data replication is sometimes referred to as transactional replication. </para> @@ -159,7 +161,9 @@ <para> Each subscription will receive changes via one replication slot (see - <xref linkend="streaming-replication-slots">). + <xref linkend="streaming-replication-slots">). Additional temporary + replication slots may be required for the initial data synchronization + of pre-existing table data. </para> <para> @@ -264,9 +268,25 @@ to <literal>replica</literal>, which produces the usual effects on triggers and constraints. </para> + + <sect2 id="logical-replication-snapshot"> + <title>Initial Snapshot</title> + <para> + The initial data in existing subscribed tables are snapshotted and + copied in a parallel instance of a special kind of apply process. + This process will create its own temporary replication slot and + copy the existing data. Once existing data is copied, the worker + enters synchronization mode, which ensures that the table is brought + up to a synchronized state with the main apply process by streaming + any changes that happened during the initial data copy using standard + logical replication. Once the synchronization is done, the control + of the replication of the table is given back to the main apply + process where the replication continues as normal. + </para> + </sect2> </sect1> - <sect1 id="logical-replication-monitoring"> + <sect1 id="logical-replication-monitoring"> <title>Monitoring</title> <para> @@ -287,7 +307,9 @@ <para> Normally, there is a single apply process running for an enabled subscription. A disabled subscription or a crashed subscription will have - zero rows in this view. + zero rows in this view. If the initial data synchronization of any + table is in progress, there will be additional workers for the tables + being synchronized. </para> </sect1> @@ -337,10 +359,11 @@ <para> On the publisher side, <varname>wal_level</varname> must be set to <literal>logical</literal>, and <varname>max_replication_slots</varname> - must be set to at least the number of subscriptions expected to connect. - And <varname>max_wal_senders</varname> should be set to at least the same - as <varname>max_replication_slots</varname> plus the number of physical replicas - that are connected at the same time. + must be set to at least the number of subscriptions expected to connect, + plus some reserve for table synchronization. And + <varname>max_wal_senders</varname> should be set to at least the same as + <varname>max_replication_slots</varname> plus the number of physical + replicas that are connected at the same time. </para> <para> @@ -348,9 +371,9 @@ to be set. In this case it should be set to at least the number of subscriptions that will be added to the subscriber. <varname>max_logical_replication_workers</varname> must be set to at - least the number of subscriptions. Additionally the - <varname>max_worker_processes</varname> may need to be adjusted to - accommodate for replication workers, at least + least the number of subscriptions, again plus some reserve for the table + synchronization. Additionally the <varname>max_worker_processes</varname> + may need to be adjusted to accommodate for replication workers, at least (<varname>max_logical_replication_workers</varname> + <literal>1</literal>). Note that some extensions and parallel queries also take worker slots from <varname>max_worker_processes</varname>. @@ -393,8 +416,10 @@ CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICAT </para> <para> - The above will start the replication process of changes to - <literal>users</literal> and <literal>departments</literal> tables. + The above will start the replication process, which synchronizes the + initial table contents of the tables <literal>users</literal> and + <literal>departments</literal> and then starts replicating + incremental changes to those tables. </para> </sect1> </chapter> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index dcb2d3303c1..eb6f4866773 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1864,6 +1864,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <entry>Process ID of the subscription worker process</entry> </row> <row> + <entry><structfield>relid</></entry> + <entry><type>Oid</></entry> + <entry>OID of the relation that the worker is synchronizing; null for the + main apply worker</entry> + </row> + <row> <entry><structfield>received_lsn</></entry> <entry><type>pg_lsn</></entry> <entry>Last transaction log position received, the initial value of @@ -1899,7 +1905,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <para> The <structname>pg_stat_subscription</structname> view will contain one row per subscription for main worker (with null PID if the worker is - not running). + not running), and additional rows for workers handling the initial data + copy of the subscribed tables. </para> <table id="pg-stat-ssl-view" xreflabel="pg_stat_ssl"> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 244e381de9a..48ca4140312 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1487,7 +1487,7 @@ The commands accepted in walsender mode are: </varlistentry> <varlistentry id="protocol-replication-create-slot" xreflabel="CREATE_REPLICATION_SLOT"> - <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</> [ <literal>TEMPORARY</> ] { <literal>PHYSICAL</> [ <literal>RESERVE_WAL</> ] | <literal>LOGICAL</> <replaceable class="parameter">output_plugin</> [ <literal>EXPORT_SNAPSHOT</> | <literal>NOEXPORT_SNAPSHOT</> ] } + <term><literal>CREATE_REPLICATION_SLOT</literal> <replaceable class="parameter">slot_name</> [ <literal>TEMPORARY</> ] { <literal>PHYSICAL</> [ <literal>RESERVE_WAL</> ] | <literal>LOGICAL</> <replaceable class="parameter">output_plugin</> [ <literal>EXPORT_SNAPSHOT</> | <literal>NOEXPORT_SNAPSHOT</> | <literal>USE_SNAPSHOT</> ] } <indexterm><primary>CREATE_REPLICATION_SLOT</primary></indexterm> </term> <listitem> @@ -1542,12 +1542,17 @@ The commands accepted in walsender mode are: <varlistentry> <term><literal>EXPORT_SNAPSHOT</></term> <term><literal>NOEXPORT_SNAPSHOT</></term> + <term><literal>USE_SNAPSHOT</></term> <listitem> <para> Decides what to do with the snapshot created during logical slot initialization. <literal>EXPORT_SNAPSHOT</>, which is the default, will export the snapshot for use in other sessions. This option can't - be used inside a transaction. <literal>NOEXPORT_SNAPSHOT</> will + be used inside a transaction. <literal>USE_SNAPSHOT</> will use the + snapshot for the current transaction executing the command. This + option must be used in a transaction, and + <literal>CREATE_REPLICATION_SLOT</literal> must be the first command + run in that transaction. Finally, <literal>NOEXPORT_SNAPSHOT</> will just use the snapshot for logical decoding as normal but won't do anything else with it. </para> diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 5e18e2ff6c1..6f94247b923 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -21,15 +21,21 @@ PostgreSQL documentation <refsynopsisdiv> <synopsis> -ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> WITH ( <replaceable class="PARAMETER">option</replaceable> [, ... ] ) ] +ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> WITH ( <replaceable class="PARAMETER">suboption</replaceable> [, ... ] ) ] -<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase> +<phrase>where <replaceable class="PARAMETER">suboption</replaceable> can be:</phrase> - SLOT NAME = <replaceable class="PARAMETER">slot_name</replaceable> + SLOT NAME = <replaceable class="PARAMETER">slot_name</replaceable> + +ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> SET PUBLICATION <replaceable class="PARAMETER">publication_name</replaceable> [, ...] { REFRESH WITH ( <replaceable class="PARAMETER">puboption</replaceable> [, ... ] ) | NOREFRESH } +ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> REFRESH PUBLICATION WITH ( <replaceable class="PARAMETER">puboption</replaceable> [, ... ] ) + +<phrase>where <replaceable class="PARAMETER">puboption</replaceable> can be:</phrase> + + COPY DATA | NOCOPY DATA ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> OWNER TO { <replaceable>new_owner</replaceable> | CURRENT_USER | SESSION_USER } ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> CONNECTION '<replaceable>conninfo</replaceable>' -ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> SET PUBLICATION <replaceable>publication_name</replaceable> [, ...] ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> ENABLE ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> DISABLE </synopsis> @@ -65,7 +71,6 @@ ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> DISABLE <varlistentry> <term><literal>CONNECTION '<replaceable class="parameter">conninfo</replaceable>'</literal></term> - <term><literal>SET PUBLICATION <replaceable class="parameter">publication_name</replaceable></literal></term> <term><literal>SLOT NAME = <replaceable class="parameter">slot_name</replaceable></literal></term> <listitem> <para> @@ -77,6 +82,40 @@ ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> DISABLE </varlistentry> <varlistentry> + <term><literal>SET PUBLICATION <replaceable class="parameter">publication_name</replaceable></literal></term> + <listitem> + <para> + Changes list of subscribed publications. See + <xref linkend="SQL-CREATESUBSCRIPTION"> for more information. + </para> + <para> + When <literal>REFRESH</literal> is specified, this command will also + act like <literal>REFRESH PUBLICATION</literal>. When + <literal>NOREFRESH</literal> is specified, the comamnd will not try to + refresh table information. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>REFRESH PUBLICATION</term> + <listitem> + <para> + Fetch missing table information from publisher. This will start + replication of tables that were added to the subscribed-to publications + since the last invocation of <command>REFRESH PUBLICATION</command> or + since <command>CREATE SUBSCRIPTION</command>. + </para> + <para> + The <literal>COPY DATA</literal> and <literal>NOCOPY DATA</literal> + options specify if the existing data in the publications that are being + subscribed to should be copied. <literal>COPY DATA</literal> is the + default. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><literal>ENABLE</literal></term> <listitem> <para> @@ -95,6 +134,7 @@ ALTER SUBSCRIPTION <replaceable class="PARAMETER">name</replaceable> DISABLE </para> </listitem> </varlistentry> + </variablelist> </refsect1> diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e2000767003..8f3c30b9b01 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -31,6 +31,8 @@ CREATE SUBSCRIPTION <replaceable class="PARAMETER">subscription_name</replaceabl | ENABLED | DISABLED | CREATE SLOT | NOCREATE SLOT | SLOT NAME = <replaceable class="PARAMETER">slot_name</replaceable> + | COPY DATA | NOCOPY DATA + | NOCONNECT </synopsis> </refsynopsisdiv> @@ -132,6 +134,42 @@ CREATE SUBSCRIPTION <replaceable class="PARAMETER">subscription_name</replaceabl </para> </listitem> </varlistentry> + + <varlistentry> + <term>COPY DATA</term> + <term>NOCOPY DATA</term> + <listitem> + <para> + Specifies if the existing data in the publications that are being + subscribed to should be copied once the replication starts. + <literal>COPY DATA</literal> is the default. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>NOCONNECT</term> + <listitem> + <para> + Instructs <command>CREATE SUBSCRIPTION</command> to skip the initial + connection to the provider. This will change default values of other + options to <literal>DISABLED</literal>, + <literal>NOCREATE SLOT</literal>, and <literal>NOCOPY DATA</literal>. + </para> + <para> + It's not allowed to combine <literal>NOCONNECT</literal> and + <literal>ENABLED</literal>, <literal>CREATE SLOT</literal>, or + <literal>COPY DATA</literal>. + </para> + <para> + Since no connection is made when this option is specified, the tables + are not subscribed, so after you enable the subscription nothing will + be replicated. It is required to run + <literal>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</> in order for + tables to be subscribed. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bb32fb12e0b..4f19b892321 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -799,22 +799,23 @@ PostgreSQL documentation </varlistentry> <varlistentry> - <term><option>--no-create-subscription-slots</option></term> + <term><option>--no-security-labels</option></term> <listitem> <para> - When dumping logical replication subscriptions, - generate <command>CREATE SUBSCRIPTION</command> commands that do not - create the remote replication slot. That way, the dump can be - restored without requiring network access to the remote servers. + Do not dump security labels. </para> </listitem> </varlistentry> <varlistentry> - <term><option>--no-security-labels</option></term> + <term><option>--no-subscription-connect</option></term> <listitem> <para> - Do not dump security labels. + When dumping logical replication subscriptions, + generate <command>CREATE SUBSCRIPTION</command> commands that do not + make remote connections for creating replication slot or initial table + copy. That way, the dump can be restored without requiring network + access to the remote servers. </para> </listitem> </varlistentry> |
