From 7c4f52409a8c7d85ed169bbbc1f6092274d03920 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 23 Mar 2017 08:36:36 -0400 Subject: Logical replication support for initial data copy Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek Tested-by: Erik Rijkers --- doc/src/sgml/catalogs.sgml | 78 +++++++++++++++++++++++++++++++ doc/src/sgml/config.sgml | 25 ++++++++++ doc/src/sgml/logical-replication.sgml | 55 ++++++++++++++++------ doc/src/sgml/monitoring.sgml | 9 +++- doc/src/sgml/protocol.sgml | 9 +++- doc/src/sgml/ref/alter_subscription.sgml | 50 ++++++++++++++++++-- doc/src/sgml/ref/create_subscription.sgml | 38 +++++++++++++++ doc/src/sgml/ref/pg_dump.sgml | 15 +++--- 8 files changed, 249 insertions(+), 30 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index df0435c3f00..228ec780318 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -300,6 +300,11 @@ logical replication subscriptions + + pg_subscription_rel + relation state for subscriptions + + pg_tablespace tablespaces within this database cluster @@ -6418,6 +6423,79 @@ + + <structname>pg_subscription_rel</structname> + + + pg_subscription_rel + + + + The catalog pg_subscription_rel contains the + state for each replicated relation in each subscription. This is a + many-to-many mapping. + + + + This catalog only contains tables known to the subscription after running + either CREATE SUBSCRIPTION or + ALTER SUBSCRIPTION ... REFRESH. + + + + <structname>pg_subscription_rel</structname> Columns + + + + + Name + Type + References + Description + + + + + + srsubid + oid + pg_subscription.oid + Reference to subscription + + + + srrelid + oid + pg_class.oid + Reference to relation + + + + srsubstate + char + + + State code: + i = initialize, + d = data is being copied, + s = synchronized, + r = ready (normal replication) + + + + + srsublsn + pg_lsn + + + End LSN for s and r states. + + + + +
+
+ <structname>pg_tablespace</structname> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b379b67b30a..2de3540def7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3449,6 +3449,31 @@ ANY num_sync ( + max_sync_workers_per_subscription (integer) + + max_sync_workers_per_subscription configuration parameter + + + + + Maximum number of synchronization workers per subscription. This + parameter controls the amount of paralelism of the initial data copy + during the subscription initialization or when new tables are added. + + + Currently, there can be only one synchronization worker per table. + + + The synchronization workers are taken from the pool defined by + max_logical_replication_workers. + + + The default value is 2. + + + + diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 44cd78563d3..48db9cd08b7 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -24,9 +24,11 @@ - Logical replication sends changes on the publisher to the subscriber as - they occur in real-time. The subscriber applies the data in the same order - as the publisher so that transactional consistency is guaranteed for + Logical replication of a table typically starts with a taking a snapshot + of the data on the publisher database and copying that to the subscriber. + Once that is done, the changes on the publisher are sent to the subscriber + as they occur in real-time. The subscriber applies the data in the same + order as the publisher so that transactional consistency is guaranteed for publications within a single subscription. This method of data replication is sometimes referred to as transactional replication. @@ -159,7 +161,9 @@ Each subscription will receive changes via one replication slot (see - ). + ). Additional temporary + replication slots may be required for the initial data synchronization + of pre-existing table data. @@ -264,9 +268,25 @@ to replica, which produces the usual effects on triggers and constraints. + + + Initial Snapshot + + The initial data in existing subscribed tables are snapshotted and + copied in a parallel instance of a special kind of apply process. + This process will create its own temporary replication slot and + copy the existing data. Once existing data is copied, the worker + enters synchronization mode, which ensures that the table is brought + up to a synchronized state with the main apply process by streaming + any changes that happened during the initial data copy using standard + logical replication. Once the synchronization is done, the control + of the replication of the table is given back to the main apply + process where the replication continues as normal. + + - + Monitoring @@ -287,7 +307,9 @@ Normally, there is a single apply process running for an enabled subscription. A disabled subscription or a crashed subscription will have - zero rows in this view. + zero rows in this view. If the initial data synchronization of any + table is in progress, there will be additional workers for the tables + being synchronized. @@ -337,10 +359,11 @@ On the publisher side, wal_level must be set to logical, and max_replication_slots - must be set to at least the number of subscriptions expected to connect. - And max_wal_senders should be set to at least the same - as max_replication_slots plus the number of physical replicas - that are connected at the same time. + must be set to at least the number of subscriptions expected to connect, + plus some reserve for table synchronization. And + max_wal_senders should be set to at least the same as + max_replication_slots plus the number of physical + replicas that are connected at the same time. @@ -348,9 +371,9 @@ to be set. In this case it should be set to at least the number of subscriptions that will be added to the subscriber. max_logical_replication_workers must be set to at - least the number of subscriptions. Additionally the - max_worker_processes may need to be adjusted to - accommodate for replication workers, at least + least the number of subscriptions, again plus some reserve for the table + synchronization. Additionally the max_worker_processes + may need to be adjusted to accommodate for replication workers, at least (max_logical_replication_workers + 1). Note that some extensions and parallel queries also take worker slots from max_worker_processes. @@ -393,8 +416,10 @@ CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICAT - The above will start the replication process of changes to - users and departments tables. + The above will start the replication process, which synchronizes the + initial table contents of the tables users and + departments and then starts replicating + incremental changes to those tables. diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index dcb2d3303c1..eb6f4866773 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1863,6 +1863,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i integer Process ID of the subscription worker process
+ + relid + Oid + OID of the relation that the worker is synchronizing; null for the + main apply worker + received_lsn pg_lsn @@ -1899,7 +1905,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i The pg_stat_subscription view will contain one row per subscription for main worker (with null PID if the worker is - not running). + not running), and additional rows for workers handling the initial data + copy of the subscribed tables. diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 244e381de9a..48ca4140312 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1487,7 +1487,7 @@ The commands accepted in walsender mode are: - CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT ] } + CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT ] } CREATE_REPLICATION_SLOT @@ -1542,12 +1542,17 @@ The commands accepted in walsender mode are: EXPORT_SNAPSHOT NOEXPORT_SNAPSHOT + USE_SNAPSHOT Decides what to do with the snapshot created during logical slot initialization. EXPORT_SNAPSHOT, which is the default, will export the snapshot for use in other sessions. This option can't - be used inside a transaction. NOEXPORT_SNAPSHOT will + be used inside a transaction. USE_SNAPSHOT will use the + snapshot for the current transaction executing the command. This + option must be used in a transaction, and + CREATE_REPLICATION_SLOT must be the first command + run in that transaction. Finally, NOEXPORT_SNAPSHOT will just use the snapshot for logical decoding as normal but won't do anything else with it. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 5e18e2ff6c1..6f94247b923 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -21,15 +21,21 @@ PostgreSQL documentation -ALTER SUBSCRIPTION name WITH ( option [, ... ] ) ] +ALTER SUBSCRIPTION name WITH ( suboption [, ... ] ) ] -where option can be: +where suboption can be: - SLOT NAME = slot_name + SLOT NAME = slot_name + +ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] { REFRESH WITH ( puboption [, ... ] ) | NOREFRESH } +ALTER SUBSCRIPTION name REFRESH PUBLICATION WITH ( puboption [, ... ] ) + +where puboption can be: + + COPY DATA | NOCOPY DATA ALTER SUBSCRIPTION name OWNER TO { new_owner | CURRENT_USER | SESSION_USER } ALTER SUBSCRIPTION name CONNECTION 'conninfo' -ALTER SUBSCRIPTION name SET PUBLICATION publication_name [, ...] ALTER SUBSCRIPTION name ENABLE ALTER SUBSCRIPTION name DISABLE @@ -65,7 +71,6 @@ ALTER SUBSCRIPTION name DISABLE CONNECTION 'conninfo' - SET PUBLICATION publication_name SLOT NAME = slot_name @@ -76,6 +81,40 @@ ALTER SUBSCRIPTION name DISABLE + + SET PUBLICATION publication_name + + + Changes list of subscribed publications. See + for more information. + + + When REFRESH is specified, this command will also + act like REFRESH PUBLICATION. When + NOREFRESH is specified, the comamnd will not try to + refresh table information. + + + + + + REFRESH PUBLICATION + + + Fetch missing table information from publisher. This will start + replication of tables that were added to the subscribed-to publications + since the last invocation of REFRESH PUBLICATION or + since CREATE SUBSCRIPTION. + + + The COPY DATA and NOCOPY DATA + options specify if the existing data in the publications that are being + subscribed to should be copied. COPY DATA is the + default. + + + + ENABLE @@ -95,6 +134,7 @@ ALTER SUBSCRIPTION name DISABLE + diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e2000767003..8f3c30b9b01 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -31,6 +31,8 @@ CREATE SUBSCRIPTION subscription_nameslot_name + | COPY DATA | NOCOPY DATA + | NOCONNECT @@ -132,6 +134,42 @@ CREATE SUBSCRIPTION subscription_name + + + COPY DATA + NOCOPY DATA + + + Specifies if the existing data in the publications that are being + subscribed to should be copied once the replication starts. + COPY DATA is the default. + + + + + + NOCONNECT + + + Instructs CREATE SUBSCRIPTION to skip the initial + connection to the provider. This will change default values of other + options to DISABLED, + NOCREATE SLOT, and NOCOPY DATA. + + + It's not allowed to combine NOCONNECT and + ENABLED, CREATE SLOT, or + COPY DATA. + + + Since no connection is made when this option is specified, the tables + are not subscribed, so after you enable the subscription nothing will + be replicated. It is required to run + ALTER SUBSCRIPTION ... REFRESH PUBLICATION in order for + tables to be subscribed. + + + diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bb32fb12e0b..4f19b892321 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -799,22 +799,23 @@ PostgreSQL documentation - + - When dumping logical replication subscriptions, - generate CREATE SUBSCRIPTION commands that do not - create the remote replication slot. That way, the dump can be - restored without requiring network access to the remote servers. + Do not dump security labels. - + - Do not dump security labels. + When dumping logical replication subscriptions, + generate CREATE SUBSCRIPTION commands that do not + make remote connections for creating replication slot or initial table + copy. That way, the dump can be restored without requiring network + access to the remote servers. -- cgit v1.2.3