Age | Commit message (Collapse) | Author |
|
Commit 216a784829c introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.
This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.
This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.
Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.
Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
|
|
Based on suggestions by Tom Lane
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/20250916.114644.275726106301941878.horikyota.ntt@gmail.com
|
|
This commit resumes automatic retention of conflict-relevant data for a
subscription. Previously, retention would stop if the apply process failed
to advance its xmin (oldest_nonremovable_xid) within the configured
max_retention_duration and user needs to manually re-enable
retain_dead_tuples option. With this change, retention will resume
automatically once the apply worker catches up and begins advancing its
xmin (oldest_nonremovable_xid) within the configured threshold.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
Users of logical decoding can encounter an unexpected change of
CurrentResourceOwner and CurrentMemoryContext. The problem is that,
unlike other call sites of RollbackAndReleaseCurrentSubTransaction(), in
reorderbuffer.c we fail to restore the original values of these global
variables after being clobbered by subtransaction abort. This patch
saves the values prior to the call and restores them eventually.
In addition, logical.c and logicalfuncs.c had a hack to restore resource
owner, presumably because of lack of this restore. Remove that.
Instead, because the test coverage here is not very consistent, add an
Assert() to ensure that the resowner is kept identical; this would make
it easy to detect other cases of bugs were we fail to restore resowner
properly. This could be removed later.
This is arguably an old bug, but there appears to be no reason to
backpatch it and it's risky to do so, so refrain for now.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/119497.1756892972@localhost
|
|
The Sun Studio compiler complains about an empty declaration here.
Note for future historians: This does not mean that this compiler is
still of current interest for anyone using PostgreSQL. But we can let
this small fix be its parting gift.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
|
|
All the calls replaced by this commit use 4-byte integers for their
variables used in input of my_log2(). Hence, the limit against
too-large inputs does not really apply. Thresholds are also applied, as
of:
- In nodeAgg.c, the number of partitions is limited by
HASHAGG_MAX_PARTITIONS.
- In nodeHash.c, ExecChooseHashTableSize() caps its maximum number of
buckets based on HashJoinTuple and palloc() allocation limit.
- In worker.c, the number of subxacts tracked by ApplySubXactData uses
uint32, making pg_ceil_log2_64() safe to use directly.
Several approaches have been discussed, like an integration with
thresholds in pg_bitutils.h, but it was found confusing. This uses
Dean's idea, which gives a simpler result than what I came up with to be
able to remove dynahash.h. dynahash.h will be removed in a follow-up
commit, removing some duplication with the ceil log2 routines.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCUJPQD_7sC-wErak2CQGNa6bj2hY-mr8wsBki=kX7f2_A@mail.gmail.com
|
|
Address a potential SIGSEGV that may occur when the tablesync worker
attempts to locate a deleted row while applying changes. This situation
arises during conflict detection for update-deleted scenarios.
To prevent this crash, ensure that the operation is errored out early if
the leader apply worker is unavailable. Since the leader worker maintains
the necessary conflict detection metadata, proceeding without it serves no
purpose and risks reporting incorrect conflict type.
In the passing, improve a nearby comment.
Reported by Tom Lane as per Coverity
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/334468.1757280992@sss.pgh.pa.us
|
|
This commit fixes three issues:
1) When a disabled subscription is created with retain_dead_tuples set to true,
the launcher is not woken up immediately, which may lead to delays in creating
the conflict detection slot.
Creating the conflict detection slot is essential even when the subscription is
not enabled. This ensures that dead tuples are retained, which is necessary for
accurately identifying the type of conflict during replication.
2) Conflict-related data was unnecessarily retained when the subscription does
not have a table.
3) Conflict-relevant data could be prematurely removed before applying
prepared transactions on the publisher that are in the commit critical section.
This issue occurred because the backend executing COMMIT PREPARED was not
accounted for during the computation of oldestXid in the commit phase on
the publisher. As a result, the subscriber could advance the conflict
slot's xmin without waiting for such COMMIT PREPARED transactions to
complete.
We fixed this issue by identifying prepared transactions that are in the
commit critical section during computation of oldestXid in commit phase.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB16913DACB64E5721872AA5C02943BA@OS9PR01MB16913.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/OS9PR01MB16913F67856B0DA2A909788129400A@OS9PR01MB16913.jpnprd01.prod.outlook.com
|
|
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 18, where it was introduced
Discussion: https://postgr.es/m/CANhcyEXMrcEdzj-RNGJam0nJHM4y+ttdWsgUCFmXciM7BNKc7A@mail.gmail.com
|
|
This commit introduces a new subscription parameter,
max_retention_duration, aimed at mitigating excessive accumulation of dead
tuples when retain_dead_tuples is enabled and the apply worker lags behind
the publisher.
When the time spent advancing a non-removable transaction ID exceeds the
max_retention_duration threshold, the apply worker will stop retaining
conflict detection information. In such cases, the conflict slot's xmin
will be set to InvalidTransactionId, provided that all apply workers
associated with the subscription (with retain_dead_tuples enabled) confirm
the retention duration has been exceeded.
To ensure retention status persists across server restarts, a new column
subretentionactive has been added to the pg_subscription catalog. This
prevents unnecessary reactivation of retention logic after a restart.
The conflict detection slot will not be automatically re-initialized
unless a new subscription is created with retain_dead_tuples = true, or
the user manually re-enables retain_dead_tuples.
A future patch will introduce support for automatic slot re-initialization
once at least one apply worker confirms that the retention duration is
within the configured max_retention_duration.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
Oversight in commit 93db6cbda0.
Author: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/tencent_7B42BBE8D0A5C28DDAB91436192CBCCB8307%40qq.com
|
|
This has been done historically because of get_database_name (which
since commit cb98e6fb8fd4 belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.
Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b393a, so remove them.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
|
|
Use "row" instead of "tuple" for user-facing information for
logical replication conflicts.
|
|
Oversight in commit f4b54e1ed9.
Author: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQAobFsHaLMypA6C96-9YExvF4AcU1xNPoPuNYRVm3mq4dg%40mail.gmail.com
|
|
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.
ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode. This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.
Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical). These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures. There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine. The CI is OK with that, now let's see what
the buildfarm tells.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
|
|
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.
However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.
To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.
Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
|
|
Oversight in commit fd5a1a0c3e56.
Author: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmTT3M_w4NngG=6G3mdT3iJ6DdncTqV9YnGXBPHW8XYtA@mail.gmail.com
|
|
Commit 2633dae2e48 standardized all existing messages to use `%X/%08X`
for LSNs, but this one crept back in after the commit.
|
|
Commit 2633dae2e48 standardized LSN formatting but mistakenly changed
the logical snapshot filename format in SnapBuildSnapshotExists() from
"%X-%X.snap" to "%08X-%08X.snap". Other code still used the original
"%X-%X.snap" format, causing the replication slot synchronization worker
to fail to find existing snapshot files and produce excessive log messages.
This commit restores the original "%X-%X.snap" format
in SnapBuildSnapshotExists() to resolve the issue.
Author: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHuHPB-ucAk_Tq3uSs4Fdziu1Jp_AA_RD3m5Ycky7m48w@mail.gmail.com
|
|
This commit makes use of the existing PqMsg_* macros in more places
and adds new PqReplMsg_* and PqBackupMsg_* macros for use in
special replication and backup messages, respectively.
Author: Dave Cramer <davecramer@gmail.com>
Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/aIECfYfevCUpenBT@nathan
Discussion: https://postgr.es/m/CAFcNs%2Br73NOUb7%2BqKrV4HHEki02CS96Z%2Bx19WaFgE087BWwEng%40mail.gmail.com
|
|
Macros like VARDATA() and VARSIZE() should be thought of as taking
values of type pointer to struct varlena or some other related struct.
The way they are implemented, you can pass anything to it and it will
cast it right. But this is in principle incorrect. To fix, add the
required DatumGetPointer() calls. Or in a couple of cases, remove
superfluous PointerGetDatum() calls.
It is planned in a subsequent patch to change macros like VARDATA()
and VARSIZE() to inline functions, which will enforce stricter typing.
This is in preparation for that.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
|
|
Previously, enabling sync_replication_slots while wal_level was not set
to logical could cause the server to shut down. This was because
the postmaster performed a configuration check before launching
the slot synchronization worker and raised an ERROR if the settings
were incompatible. Since ERROR is treated as FATAL in the postmaster,
this resulted in the entire server shutting down unexpectedly.
This commit changes the postmaster to log that message with a LOG-level
instead of raising an ERROR, allowing the server to continue running
even with the misconfiguration.
Back-patch to v17, where slot synchronization was introduced.
Reported-by: Hugo DUBOIS <hdubois@scaleway.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com
Backpatch-through: 17
|
|
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
In the current system architecture, none of these are worth obsessing
over; most are once-per-process leaks. However, Valgrind complains
about all of them, and if we get to using threads rather than
processes for backend sessions, it will become more interesting to
avoid per-session leaks.
* Fix leaks in StartupXLOG() and ShutdownWalRecovery().
* Fix leakage of pq_mq_handle in a parallel worker.
While at it, move mq_putmessage's "Assert(pq_mq_handle != NULL)"
to someplace where it's not trivially useless.
* Fix leak in logicalrep_worker_detach().
* Don't leak the startup-packet buffer in ProcessStartupPacket().
* Fix leak in evtcache.c's DecodeTextArrayToBitmapset().
If the presented array is toasted, this neglected to free the
detoasted copy, which was then leaked into EventTriggerCacheContext.
* I'm distressed by the amount of code that BuildEventTriggerCache
is willing to run while switched into a long-lived cache context.
Although the detoasted array is the only leak that Valgrind reports,
let's tighten things up while we're here. (DecodeTextArrayToBitmapset
is still run in the cache context, so doing this doesn't remove the
need for the detoast fix. But it reduces the surface area for other
leaks.)
* load_domaintype_info() intentionally leaked some intermediate cruft
into the long-lived DomainConstraintCache's memory context, reasoning
that the amount of leakage will typically not be much so it's not
worth doing a copyObject() of the final tree to avoid that. But
Valgrind knows nothing of engineering tradeoffs and complains anyway.
On the whole, the copyObject doesn't cost that much and this is surely
not a performance-critical code path, so let's do it the clean way.
* MarkGUCPrefixReserved didn't bother to clean up removed placeholder
GUCs at all, which shows up as a leak in one regression test.
It seems appropriate for it to do as much cleanup as
define_custom_variable does when replacing placeholders, so factor
that code out into a helper function. define_custom_variable's logic
was one brick shy of a load too: it forgot to free the separate
allocation for the placeholder's name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
|
|
In ReorderBufferProcessTXN(), used to send the data of a transaction to
an output plugin, INSERT ON CONFLICT changes (INTERNAL_SPEC_INSERT) are
delayed until a confirmation record arrives (INTERNAL_SPEC_CONFIRM),
updating the change being processed.
8c58624df462 has added an extra step after processing a change to update
the progress of the transaction, by calling the callback
update_progress_txn() based on the LSN stored in a change after a
threshold of CHANGES_THRESHOLD (100) is reached. This logic has missed
the fact that for an INSERT ON CONFLICT change the data is freed once
processed, hence update_progress_txn() could be called pointing to a LSN
value that's already been freed. This could result in random crashes,
depending on the workload.
Per discussion, this issue is fixed by reusing in update_progress_txn()
the LSN from the change processed found at the beginning of the loop,
meaning that for a INTERNAL_SPEC_CONFIRM change the progress is updated
using the LSN of the INTERNAL_SPEC_CONFIRM change, and not the LSN from
its INTERNAL_SPEC_INSERT change. This is actually more correct, as we
want to update the progress to point to the INTERNAL_SPEC_CONFIRM
change.
Masahiko Sawada has found a nice trick to reproduce the issue: hardcode
CHANGES_THRESHOLD at 1 and run test_decoding (test "ddl" being enough)
on an instance running valgrind. The bug has been analyzed by Ethan
Mertz, who also originally suggested the solution used in this patch.
Issue introduced by 8c58624df462, so backpatch down to v16.
Author: Ethan Mertz <ethan.mertz@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aIsQqDZ7x4LAQ6u1@paquier.xyz
Backpatch-through: 16
|
|
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
|
|
Commit 473a575e05979b4dbb28b3f2544f4ec8f184ce65 purported to make this
function stash the error message in *syncrep_parse_result_p, but
it didn't actually.
As a result, an attempt to set synchronous_standby_names to any value
that does not parse resulted in a generic "parser failed." message
rather than anything more specific. This fixes that.
Discussion: http://postgr.es/m/CA+TgmoYF9wPNZ-Q_EMfib_espgHycY-eX__6Tzo2GpYpVXqCdQ@mail.gmail.com
Backpatch-through: 18
|
|
Remove a bunch of PG_TRY constructs, de-volatilize related
variables, remove some PQclear calls in error paths.
Aside from making the code simpler and shorter, this should
provide some marginal performance gains.
For ease of review, I did not re-indent code within the removed
PG_TRY constructs. That'll be done in a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
|
|
Oversights in commits f4b54e1ed9, dc21234005, and 228c370868.
Author: Dave Cramer <davecramer@gmail.com>
Discussion: https://postgr.es/m/CADK3HH%2BowWVdnbmWH4NHG8%3D%2BkXA_wjsyEVLoY719iJnb%3D%2BtT6A%40mail.gmail.com
|
|
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.
This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.
Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.
To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.
Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.
During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.
This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
Commit 112faf1378e introduced a translation marker in libpq-be-fe-helpers.h,
but this caused build failures on some platforms—such as the one reported
by buildfarm member indri—due to linker issues with dblink. This is the same
problem previously addressed in commit 213c959a294.
To fix the issue, this commit removes the translation marker from
libpq-be-fe-helpers.h, following the approach used in 213c959a294.
It also removes the associated gettext_noop() calls added in commit
112faf1378e, as they are no longer needed.
While reviewing this, a gettext_noop() call was also found in
contrib/basic_archive. Since contrib modules don't support translation,
this call has been removed as well.
Per buildfarm member indri.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0e6299d9-608a-4ffa-aeb1-40cb8a99000b@oss.nttdata.com
|
|
Previously, NOTICE, WARNING, and similar messages received from remote
servers over replication, postgres_fdw, or dblink connections were printed
directly to stderr on the local server (e.g., the subscriber). As a result,
these messages lacked log prefixes (e.g., timestamp), making them harder
to trace and correlate with other log entries.
This commit addresses the issue by introducing a custom notice receiver
for replication, postgres_fdw, and dblink connections. These messages
are now logged via ereport(), ensuring they appear in the logs with proper
formatting and context, which improves clarity and aids in debugging.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2xsHpWRtLm-VL_HJCsaE3+1Y_n-jDEAr3-suxVqc3xoQ@mail.gmail.com
|
|
The pgoutput plugin initializes optional parameters like "binary" with
default values at the start of processing. However, the "origin"
parameter was previously missed and left without explicit initialization.
Although the PGOutputData struct, which holds these settings,
is zero-initialized at allocation (resulting in publish_no_origin field
for "origin" parameter being false by default), this default was not
set explicitly, unlike other parameters.
This commit adds explicit initialization of the "origin" parameter to
ensure consistency and clarity in how defaults are handled.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
|
|
Previously, the check_hook functions for max_slot_wal_keep_size and
idle_replication_slot_timeout would incorrectly raise an ERROR for values
set in postgresql.conf during upgrade, even though those values were not
actively used in the upgrade process.
To prevent logical slot invalidation during upgrade, we used to set
special values for these GUCs. Now, instead of relying on those values, we
directly prevent WAL removal and logical slot invalidation caused by
max_slot_wal_keep_size and idle_replication_slot_timeout.
Note: PostgreSQL 17 does not include the idle_replication_slot_timeout
GUC, so related changes were not backported.
BUG #18979
Reported-by: jorsol <jorsol@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed by: vignesh C <vignesh21@gmail.com>
Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us
Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
|
|
This commit updates the documentation to clarify that "idle" in
idle_replication_slot_timeout means the replication slot is inactive,
that is, not currently used by any replication connection.
Without this clarification, "idle" could be misinterpreted to mean
that the slot is not advancing or that no data is being streamed,
even if a connection exists.
Back-patch to v18 where idle_replication_slot_timeout was added.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Gunnar Morling <gunnar.morling@googlemail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
|
|
Previously, the idle_replication_slot_timeout parameter used minutes
as its unit, based on the assumption that values would typically exceed
one minute in production environments. However, this caused unexpected
behavior: specifying a value below 30 seconds would round down to 0,
effectively disabling the timeout. This could be surprising to users.
To allow finer-grained control and avoid such confusion, this commit changes
the unit of idle_replication_slot_timeout to seconds. Larger values can
still be specified easily using standard time suffixes, for example,
'24h' for 24 hours.
Back-patch to v18 where idle_replication_slot_timeout was added.
Reported-by: Gunnar Morling <gunnar.morling@googlemail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
|
|
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages. Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding. This often led to confusion when comparing.
To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
|
|
A few places were accessing bh_size directly instead of via these
handy macros.
Author: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAJ7c6TPQMVL%2B028T4zuw9ZqL5Du9JavOLhBQLkJeK0RznYx_6w%40mail.gmail.com
|
|
|
|
ca307d5cec90 made CheckPointReplicationSlots() unconditionally call
ReplicationSlotsComputeRequiredLSN(). It causes an assertion trap when
max_replication_slots equals 0. This commit makes
CheckPointReplicationSlots() call ReplicationSlotsComputeRequiredLSN() only
when at least one slot gets its last_saved_restart_lsn updated. That avoids
an assert trap and also saves some cycles when no one slot has
last_saved_restart_lsn updated.
Based on ideas from Dilip Kumar <dilipbalaut@gmail.com> and
Hayato Kuroda <kuroda.hayato@fujitsu.com>.
Reported-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BB506AF934376FF3A8BB947BA%40OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
The logical replication launcher process would sometimes sleep
for as much as 3 minutes before noticing that it is supposed
to launch a new worker. This could happen if
(1) WaitForReplicationWorkerAttach absorbed a process latch wakeup
that was meant to cause ApplyLauncherMain to do work, or
(2) logicalrep_worker_launch reported failure, either because of
resource limits or because the new worker terminated immediately.
In case (2), the expected behavior is that we retry the launch after
wal_retrieve_retry_interval, but that didn't reliably happen.
It's not clear how often such conditions would occur in the field,
but in our subscription test suite they are somewhat common,
especially in tests that exercise cases that cause quick worker
failure. That causes the tests to take substantially longer than
they ought to do on typical setups.
To fix (1), make WaitForReplicationWorkerAttach re-set the latch
before returning if it cleared it while looping. To fix (2), ensure
that we reduce wait_time to no more than wal_retrieve_retry_interval
when logicalrep_worker_launch reports failure. In passing, fix a
couple of perhaps-hypothetical race conditions, e.g. examining
worker->in_use without a lock.
Backpatch to v16. Problem (2) didn't exist before commit 5a3a95385
because the previous code always set wait_time to
wal_retrieve_retry_interval when launching a worker, regardless of
success or failure of the launch. That behavior also greatly
mitigated problem (1), so I'm not excited about adapting the remainder
of the patch to the substantially-different code in older branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/817604.1750723007@sss.pgh.pa.us
Backpatch-through: 16
|
|
Remove the part of comment that says we don't allow toggling two_phase
option as that is supported in commit 1462aad2e4.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB1496656725F3951AEE8749EBDF579A@OSCPR01MB14966.jpnprd01.prod.outlook.com
|
|
ca307d5cec90 introduced keeping WAL segments by slot's last saved restart LSN.
It also added an assertion that the slot's restart LSN never goes backward.
However, situations when the restart LSN goes backward have been spotted by
buildfarm animals and investigated in the thread.
When pg_receivewal starts the replication, it sets the last replayed LSN to
the beginning of the segment, which is older than what
ReplicationSlotReserveWal() set for the slot. A similar situation can happen
to pg_basebackup. When standby reconnects to the primary, it sends the last
replayed LSN, which might be older than the last confirmed flush LSN. In
both these situations, a concurrent checkpoint may trigger an assert trap.
Based on ideas from Vitaly Davydov <v.davydov@postgrespro.ru>,
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>,
Vignesh C <vignesh21@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>.
Reported-by: Vignesh C <vignesh21@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALDaNm3s-jpQTe1MshsvQ8GO%3DTLj233JCdkQ7uZ6pwqRVpxAdw%40mail.gmail.com
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
|
|
Improve the clarity of LOG messages when a failover logical slot
synchronization fails, making the reasons more explicit for easier
debugging.
Update the documentation to outline scenarios where slot synchronization
can fail, especially during the initial sync, and emphasize that
pg_sync_replication_slot() is primarily intended for testing and
debugging purposes.
We also discussed improving the functionality of
pg_sync_replication_slot() so that it can be used reliably, but we would
take up that work for next version after some more discussion and review.
Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com
|
|
logical decoding.
Commit 4909b38af0 introduced logic to distribute invalidation messages
from catalog-modifying transactions to all concurrent in-progress
transactions. However, since each transaction distributes not only its
original invalidation messages but also previously distributed
messages to other transactions, this leads to an exponential increase
in allocation request size for invalidation messages, ultimately
causing memory allocation failure.
This commit fixes this issue by tracking distributed invalidation
messages separately per decoded transaction and not redistributing
these messages to other in-progress transactions. The maximum size of
distributed invalidation messages that one transaction can store is
limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the
distributed invalidation messages exceeds this threshold, we
invalidate all caches in locations where distributed invalidation
messages need to be executed.
Back-patch to all supported versions where we introduced the fix by
commit 4909b38af0.
Note that this commit adds two new fields to ReorderBufferTXN to store
the distributed transactions. This change breaks ABI compatibility in
back branches, affecting third-party extensions that depend on the
size of the ReorderBufferTXN struct, though this scenario seems
unlikely.
Additionally, it adds a new flag to the txn_flags field of
ReorderBufferTXN to indicate distributed invalidation message
overflow. This should not affect existing implementations, as it is
unlikely that third-party extensions use unused bits in the txn_flags
field.
Bug: #18938 #18942
Author: vignesh C <vignesh21@gmail.com>
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: John Hutchins <john.hutchins@wicourts.gov>
Reported-by: Laurence Parry <greenreaper@hotmail.com>
Reported-by: Max Madden <maxmmadden@gmail.com>
Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org
Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org
Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com
Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com
Backpatch-through: 13
|
|
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17
|
|
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The patch introduces a new in-memory state for slots: last_saved_restart_lsn,
which is used to calculate the oldest LSN for removing WAL segments. This
state is updated every time with the current restart_lsn at the moment when
the slot is saved to disk.
This fix changes the shared memory layout. It's applied to HEAD only because
we don't have to preserve ABI compatibility during the beta stage. Another
fix that doesn't affect the ABI is committed to back branches.
Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
|
|
Commit 5fe08c006c fixed this for calls to dshash_create(). This
commit fixes calls to dshash_attach() and dsa_create_in_place().
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aECi_gSD9JnVWQ8T%40nathan
|
|
A cascading WAL sender doing logical decoding (as known as doing its
work on a standby) has been using as flush LSN the value returned by
GetStandbyFlushRecPtr() (last position safely flushed to disk). This is
incorrect as such processes are only able to decode changes up to the
LSN that has been replayed by the startup process.
This commit changes cascading logical WAL senders to use the replay LSN,
as returned by GetXLogReplayRecPtr(). This distinction is important
particularly during shutdown, when WAL senders need to send any
remaining available data to their clients, switching WAL senders to a
caught-up state. Using the latest flush LSN rather than the replay LSN
could cause the WAL senders to be stuck in an infinite loop preventing
them to shut down, as the startup process does not run when WAL senders
attempt to catch up, so they could keep waiting for work that would
never happen.
Backpatch down to v16, where logical decoding on standbys has been
introduced.
Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/52138028-7246-421c-9161-4fa108b88070@postgrespro.ru
Backpatch-through: 16
|
|
A few places that access system catalogs don't set up an active
snapshot before potentially accessing their TOAST tables. To fix,
push an active snapshot just before each section of code that might
require accessing one of these TOAST tables, and pop it shortly
afterwards. While at it, this commit adds some rather strict
assertions in an attempt to prevent such issues in the future.
Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST
table in order to fix the same problem for that catalog. On the
back-branches, those bugs are left in place. We cannot easily
remove a catalog's TOAST table on released major versions, and only
replication origins with extremely long names are affected. Given
the low severity of the issue, fixing older versions doesn't seem
worth the trouble of significantly modifying the patch.
Also, on v13 and v14, the aforementioned strict assertions have
been omitted because commit 2776922201, which added
HaveRegisteredOrActiveSnapshot(), was not back-patched. While we
could probably back-patch it now, I've opted against it because it
seems unlikely that new TOAST snapshot issues will be introduced in
the oldest supported versions.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 13
|