Age | Commit message (Collapse) | Author |
|
This adds a few more functions to int128.h, allowing more of numeric.c
to use 128-bit integers on all platforms.
Specifically, int64_div_fast_to_numeric() and the following aggregate
functions can now use 128-bit integers for improved performance on all
platforms, rather than just platforms with native support for int128:
- SUM(int8)
- AVG(int8)
- STDDEV_POP(int2 or int4)
- STDDEV_SAMP(int2 or int4)
- VAR_POP(int2 or int4)
- VAR_SAMP(int2 or int4)
In addition to improved performance on platforms lacking native
128-bit integer support, this significantly simplifies this numeric
code by allowing a lot of conditionally compiled code to be deleted.
A couple of numeric functions (div_var_int64() and sqrt_var()) still
contain conditionally compiled 128-bit integer code that only works on
platforms with native 128-bit integer support. Making those work more
generally would require rolling our own higher precision 128-bit
division, which isn't supported for now.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
|
|
Recent ICU versions have added U_SHOW_CPLUSPLUS_HEADER_API, and we need
to set this to zero as well to hide the ICU C++ APIs from pg_locale.h
Per discussion, we want cpluspluscheck to work cleanly in backbranches,
so backpatch both this and its predecessor commit ed26c4e25a4 to all
supported versions.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1115793.1754414782%40sss.pgh.pa.us
Backpatch-through: 13
|
|
In the non-native code in int128_add_int64_mul_int64(), use signed
64-bit integer multiplication instead of unsigned multiplication for
the first three product terms. This simplifies the code needed to add
each product term to the result, leading to more compact and efficient
code. The actual performance gain is quite modest, but it seems worth
it to improve the code's readability.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
|
|
On platforms without native 128-bit integer support, simplify the test
for carry in int128_add_uint64() by noting that the low-part addition
is unsigned integer arithmetic, which is just modular arithmetic.
Therefore the test for carry can simply be written as "new value < old
value" (i.e., a test for modular wrap-around). This can then be made
branchless so that on modern compilers it produces the same machine
instructions as native 128-bit addition, making it significantly
simpler and faster.
Similarly, the test for carry in int128_add_int64() can be written in
much the same way, but with an extra term to compensate for the sign
of the value being added. Again, on modern compilers this leads to
branchless code, often identical to the native 128-bit integer
addition machine code.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
|
|
This commit makes use of the existing PqMsg_* macros in more places
and adds new PqReplMsg_* and PqBackupMsg_* macros for use in
special replication and backup messages, respectively.
Author: Dave Cramer <davecramer@gmail.com>
Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/aIECfYfevCUpenBT@nathan
Discussion: https://postgr.es/m/CAFcNs%2Br73NOUb7%2BqKrV4HHEki02CS96Z%2Bx19WaFgE087BWwEng%40mail.gmail.com
|
|
The name "namspace" looks like a typo, but it was presumably meant
to avoid using the "namespace" C++ keyword. This commit renames
the parameter to "nameSpace" to prevent future confusion while
still avoiding the keyword.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/aJJxpfsDfiQ1VbJ5%40nathan
|
|
This rearranges the code in src/include/common/int128.h, so that the
native and non-native implementations of each function are together
inside the function body (as they are in src/include/common/int.h),
rather than being in separate parts of the file.
This improves readability and maintainability, making it easier to
compare the native and non-native implementations, and avoiding the
need to duplicate every function comment and declaration.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
|
|
These were introduced (commit efdc7d74753) at the same time as we were
moving to using the standard inttypes.h format macros (commit
a0ed19e0a9e). It doesn't seem useful to keep a new already-deprecated
interface like this with only a few users, so remove the new symbols
again and have the callers use PRIx64.
(Also, INT64_HEX_FORMAT was kind of a misnomer, since hex formats all
use unsigned types.)
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ac47b5d-e5ab-4cac-98a7-bdee0e2831e4%40eisentraut.org
|
|
This creates a new test module src/test/modules/test_int128 and moves
src/tools/testint128.c into it so that it can be built using the
normal build system, allowing the 128-bit integer arithmetic functions
in src/include/common/int128.h to be tested automatically. For now,
the tests are skipped on platforms that don't have native int128
support.
While at it, fix the test128 union in the test code: the "hl" member
of test128 was incorrectly defined to be a union instead of a struct,
which meant that the tests were only ever setting and checking half of
each 128-bit integer value.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
|
|
This commit introduces a new column backup_type that indicates the
type of backup being performed: either 'full' or 'incremental'.
Bump catalog version.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/CAOzEurQuzbHwTj1ehk1a+eeQDidJPyrE5s6mYumkjwjZnurhkQ@mail.gmail.com
|
|
We've only bothered converting the external interfaces, not the
endian-dependent internal macros (which should not be used by any
callers other than the interface functions in this header, anyway).
The VARTAG_1B_E() changes are required for C++ compatibility.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/928ea48f-77c6-417b-897c-621ef16685a6@eisentraut.org
|
|
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
These changes don't actually fix any leaks. They just make sure that
Valgrind will find pointers to data structures that remain allocated
at process exit, and thus not falsely complain about leaks. In
particular, we are trying to avoid situations where there is no
pointer to the beginning of an allocated block (except possibly
within the block itself, which Valgrind won't count).
* Because dynahash.c never frees hashtable storage except by deleting
the whole hashtable context, it doesn't bother to track the individual
blocks of elements allocated by element_alloc(). This results in
"possibly lost" complaints from Valgrind except when the first element
of each block is actively in use. (Otherwise it'll be on a freelist,
but very likely only reachable via "interior pointers" within element
blocks, which doesn't satisfy Valgrind.)
To fix, if we're building with USE_VALGRIND, expend an extra pointer's
worth of space in each element block so that we can chain them all
together from the HTAB header. Skip this in shared hashtables though:
Valgrind doesn't track those, and we'd need additional locking to make
it safe to manipulate a shared chain.
While here, update a comment obsoleted by 9c911ec06.
* Put the dlist_node fields of catctup and catclist structs first.
This ensures that the dlist pointers point to the starts of these
palloc blocks, and thus that Valgrind won't consider them
"possibly lost".
* The postmaster's PMChild structs and the autovac launcher's
avl_dbase structs also have the dlist_node-is-not-first problem,
but putting it first still wouldn't silence the warning because we
bulk-allocate those structs in an array, so that Valgrind sees a
single allocation. Commonly the first array element will be pointed
to only from some later element, so that the reference would be an
interior pointer even if it pointed to the array start. (This is the
same issue as for dynahash elements.) Since these are pretty simple
data structures, I don't feel too bad about faking out Valgrind by
just keeping a static pointer to the array start.
(This is all quite hacky, and it's not hard to imagine usages where
we'd need some other idea in order to have reasonable leak tracking of
structures that are only accessible via dlist_node lists. But these
changes seem to be enough to silence this class of leakage complaints
for the moment.)
* Free a couple of data structures manually near the end of an
autovacuum worker's run when USE_VALGRIND, and ensure that the final
vac_update_datfrozenxid() call is done in a non-permanent context.
This doesn't have any real effect on the process's total memory
consumption, since we're going to exit as soon as that last
transaction is done. But it does pacify Valgrind.
* Valgrind complains about the postmaster's socket-files and
lock-files lists being leaked, which we can silence by just
not nulling out the static pointers to them.
* Valgrind seems not to consider the global "environ" variable as
a valid root pointer; so when we allocate a new environment array,
it claims that data is leaked. To fix that, keep our own
statically-allocated copy of the pointer, similarly to the previous
item.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
|
|
When determining whether an allocated chunk is still reachable,
Valgrind will consider only pointers within what it believes to be
allocated chunks. Normally, all of a block obtained from malloc()
would be considered "allocated" --- but it turns out that if we use
VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed
block as allocated, all the rest of that malloc'ed block is ignored.
This leads to lots of false positives of course. In particular,
in any multi-malloc-block context, all but the primary block were
reported as leaked. We also had a problem with context "ident"
strings, which were reported as leaked unless there was some other
pointer to them besides the one in the context header.
To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate
a context's management structs (the context struct itself and
any per-block headers) as allocated chunks. That forces moving
the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into
the per-context-type code, so that the pool identifier can be
made as soon as we've allocated the initial block, but otherwise
it's fairly straightforward. Note that in Valgrind's eyes there
is no distinction between these allocations and the allocations
that the mmgr modules hand out to user code. That's fine for
now, but perhaps someday we'll want to do better yet.
When reading this patch, it's helpful to start with the comments
added at the head of mcxt.c.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de
|
|
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
|
|
Commit 719dcf3c42 introduced a field called CachedPlanType in
PlannedStmt to allow extensions to determine whether a cached plan is
generic or custom.
After discussion, the concepts that we want to track are a bit wider
than initially anticipated, as it is closer to knowing from which
"source" or "origin" a PlannedStmt has been generated or retrieved.
Custom and generic cached plans are a subset of that.
Based on the state of HEAD, we have been able to define two more
origins:
- "standard", for the case where PlannedStmt is generated in
standard_planner(), the most common case.
- "internal", for the fake PlannedStmt generated internally by some
query patterns.
This could be tuned in the future depending on what is needed. This
looks like a good starting point, at least. The default value is called
"UNKNOWN", provided as fallback value. This value is not used in the
core code, the idea is to let extensions building their own PlannedStmts
know about this new field.
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aILaHupXbIGgF2wJ@paquier.xyz
|
|
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan. To help address that, here we
adjust the EXPLAIN output to display the following additional details:
1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio
Technically #4 can be calculated using #1, #2 and #3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated. The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily. For example, if #2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
|
|
The callback added in fc415edf8ca8 used to check if there is any pending
data to flush for fixed-numbered statistics, done by looping across all
the builtin and custom stats kinds with a call to have_fixed_pending_cb,
is proving to able to show in workloads that do not report any stats
(read-only, no function calls, no WAL, no IO, etc). The code used in
v17 was cheaper than that what HEAD has introduced, relying on three
boolean checks for WAL, SLRU and IO stats.
This commit switches the code to use a more efficient approach than
fc415edf8ca8, with a single boolean flag that can be switched to "true"
by any fixed-numbered stats kinds to force pgstat_report_stat() to go
through one round of reports. The flag is reset by pgstat_report_stat()
once a full round of reports is done. The flag being false means that
fixed-numbered stats kinds saw no activity, and that there is no pending
data to flush.
ac000fca743e took one step in improving the performance by reducing the
number of stats kinds that the backend can hold. This commit takes a
more drastic step by bringing back the code efficiency to what it was
before v18 with a cheap check at the beginning of pgstat_report_stat()
for its fast-exit path.
The callback have_static_pending_cb is removed as an effect of all that.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe
Backpatch-through: 18
|
|
This step can be checked mechanically.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
|
|
Remove a bunch of PG_TRY constructs, de-volatilize related
variables, remove some PQclear calls in error paths.
Aside from making the code simpler and shorter, this should
provide some marginal performance gains.
For ease of review, I did not re-indent code within the removed
PG_TRY constructs. That'll be done in a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
|
|
Commit 232d8caea fixed a case where postgres_fdw could lose track
of a PGresult object, resulting in a process-lifespan memory leak.
But I have little faith that there aren't other potential PGresult
leakages, now or in future, in the backend modules that use libpq.
Therefore, this patch proposes infrastructure that makes all
PGresults returned from libpq act as though they are palloc'd
in the CurrentMemoryContext (with the option to relocate them to
another context later). This should greatly reduce the risk of
careless leaks, and it also permits removal of a bunch of code
that attempted to prevent such leaks via PG_TRY blocks.
This patch adds infrastructure that wraps each PGresult in a
"libpqsrv_PGresult" that provides a memory context reset callback
to PQclear the PGresult. Code using this abstraction is inherently
memory-safe to the same extent as we are accustomed to in most backend
code. Furthermore, we add some macros that automatically redirect
calls of the libpq functions concerned with PGresults to use this
infrastructure, so that almost no source-code changes are needed to
wheel this infrastructure into place in all the backend code that
uses libpq.
Perhaps in future we could create similar infrastructure for
PGconn objects, but there seems less need for that.
This patch just creates the infrastructure and makes relevant code
use it, including reverting 232d8caea in favor of this mechanism.
A good deal of follow-on simplification is possible now that we don't
have to be so cautious about freeing PGresults, but I'll put that in
a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
|
|
This commit changes stats kinds to have the following bounds, making
their handling in core cheaper by default:
- PGSTAT_KIND_CUSTOM_MIN 128 -> 24
- PGSTAT_KIND_MAX 256 -> 32
The original numbers were rather high, and showed an impact on
performance in pgstat_report_stat() for the case of simple queries with
its early-exit path if there are no pending statistics to flush. This
logic will be improved more in a follow-up commit to bring the
performance of pgstat_report_stat() on par with v17 and older versions.
Lowering the bounds is a change worth doing on its own, independently of
the other improvement.
These new numbers should be enough to leave some room for the following
years for built-in and custom stats kinds, with stable ID numbers. At
least that should be enough to start with this facility for extension
developers. It can be always increased in the tree depending on the
requirements wanted.
Per discussion with Andres Freund and Bertrand Drouvot.
Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe
Backpatch-through: 18
|
|
PlannedStmt gains a new field, called CachedPlanType, able to track if a
given plan tree originates from the cache and if we are dealing with a
generic or custom cached plan.
This field can be used for monitoring or statistical purposes, in the
executor hooks, for example, based on the planned statement attached to
a QueryDesc. A patch is under discussion for pg_stat_statements to
provide an equivalent of the counters in pg_prepared_statements for
custom and generic plans, to provide a more global view of such data, as
this data is now restricted to the current session.
The concept introduced in this commit is useful on its own, and has been
extracted from a larger patch by the same author.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0uFw8Y9GCFvafhC=OA8NnMqVZyzXPfv_EePOt+iv1T-qQ@mail.gmail.com
|
|
Solaris has never bothered to add "const" to the second argument
of PAM conversation procs, as all other Unixen did decades ago.
This resulted in an "incompatible pointer" compiler warning when
building --with-pam, but had no more serious effect than that,
so we never did anything about it. However, as of GCC 14 the
case is an error not warning by default.
To complicate matters, recent OpenIndiana (and maybe illumos
in general?) *does* supply the "const" by default, so we can't
just assume that platforms using our solaris template need help.
What we can do, short of building a configure-time probe,
is to make solaris.h #define _PAM_LEGACY_NONCONST, which
causes OpenIndiana's pam_appl.h to revert to the traditional
definition, and hopefully will have no effect anywhere else.
Then we can use that same symbol to control whether we include
"const" in the declaration of pam_passwd_conv_proc().
Bug: #18995
Reported-by: Andrew Watkins <awatkins1966@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18995-82058da9ab4337a7@postgresql.org
Backpatch-through: 13
|
|
lwlock.c, lwlock.h, and wait_event_names.txt each contain a list of
built-in LWLock tranches. It is easy to miss one or the other when
adding or removing tranches, and discrepancies have adverse effects
(e.g., breaking JOINs between pg_stat_activity and pg_wait_events).
This commit moves the lists of built-in tranches in lwlock.{c,h} to
lwlocklist.h and adds a cross-check to the script that generates
lwlocknames.h. If the lists do not match exactly, building will
fail.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aHpOgwuFQfcFMZ/B%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
Commit 70e81861fadd split out xlogrecovery.c/h and moved some enums
related to recovery targets to xlogrecovery.h. However, it seems that
the enum RecoveryTargetAction was inadvertently left out by that commit.
This commit moves it to xlogrecovery.h for consistency.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20240904.173013.1132986940678039655.horikyota.ntt@gmail.com
|
|
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.
This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.
Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.
To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.
Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.
During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.
This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
Commit 112faf1378e introduced a translation marker in libpq-be-fe-helpers.h,
but this caused build failures on some platforms—such as the one reported
by buildfarm member indri—due to linker issues with dblink. This is the same
problem previously addressed in commit 213c959a294.
To fix the issue, this commit removes the translation marker from
libpq-be-fe-helpers.h, following the approach used in 213c959a294.
It also removes the associated gettext_noop() calls added in commit
112faf1378e, as they are no longer needed.
While reviewing this, a gettext_noop() call was also found in
contrib/basic_archive. Since contrib modules don't support translation,
this call has been removed as well.
Per buildfarm member indri.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0e6299d9-608a-4ffa-aeb1-40cb8a99000b@oss.nttdata.com
|
|
The assertion wouldn't have triggered for a long while yet, but this won't
accidentally fail to detect the issue if/when it occurs.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wj-43JV4YufW23gm=Uwr7Lkj+p0yKctKHxNm1rwFC+_DQ@mail.gmail.com
Backpatch-through: 18
|
|
This index AM callback has been introduced in c1ec02be1d79 and it is
optional, currently only being used by BRIN. Optional callbacks are
documented with NULL as possible value in amapi.h and indexam.sgml, but
this callback has missed this part of the description.
Reported-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/CAHut+PvgYcPmPDi1YdHMJY5upnyGRpc0N8pk1xNB11xDSBwNog@mail.gmail.com
Backpatch-through: 17
|
|
Previously, NOTICE, WARNING, and similar messages received from remote
servers over replication, postgres_fdw, or dblink connections were printed
directly to stderr on the local server (e.g., the subscriber). As a result,
these messages lacked log prefixes (e.g., timestamp), making them harder
to trace and correlate with other log entries.
This commit addresses the issue by introducing a custom notice receiver
for replication, postgres_fdw, and dblink connections. These messages
are now logged via ereport(), ensuring they appear in the logs with proper
formatting and context, which improves clarity and aids in debugging.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2xsHpWRtLm-VL_HJCsaE3+1Y_n-jDEAr3-suxVqc3xoQ@mail.gmail.com
|
|
In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets. This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner. However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.
Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point. This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.
This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID. It then uses this information to perform the NullTest
deduction for Vars during constant folding. This also makes it
possible to leverage this information to pull up NOT IN subqueries.
Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false. Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning. These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.
Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable. This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals. This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded. This approach works well and has
posed no issues.
In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner. This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.
To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
|
|
Document that restart_lsn can go backwards and explain why this could happen.
Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497
Discussion: https://postgr.es/m/CAPpHfdvuyMrUg0Vs5jPfwLOo1M9B-GP5j_My9URnBX0B%3DnrHKw%40mail.gmail.com
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Co-authored-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
If a MERGE inside a CTE attempts an UPDATE or DELETE on a table with
BEFORE ROW triggers, and a concurrent UPDATE or DELETE happens, the
merge code would fail (crashing in the case of an UPDATE action, and
potentially executing the wrong action for a DELETE action).
This is the same issue that 9321c79c86 attempted to fix, except now
for a MERGE inside a CTE. As noted in 9321c79c86, what needs to happen
is for the trigger code to exit early, returning the TM_Result and
TM_FailureData information to the merge code, if a concurrent
modification is detected, rather than attempting to do an EPQ
recheck. The merge code will then do its own rechecking, and rescan
the action list, potentially executing a different action in light of
the concurrent update. In particular, the trigger code must never call
ExecGetUpdateNewTuple() for MERGE, since that is bound to fail because
MERGE has its own per-action projection information.
Commit 9321c79c86 did this using estate->es_plannedstmt->commandType
in the trigger code to detect that a MERGE was being executed, which
is fine for a plain MERGE command, but does not work for a MERGE
inside a CTE. Fix by passing that information to the trigger code as
an additional parameter passed to ExecBRUpdateTriggers() and
ExecBRDeleteTriggers().
Back-patch as far as v17 only, since MERGE cannot appear inside a CTE
prior to that. Additionally, take care to preserve the trigger ABI in
v17 (though not in v18, which is still in beta).
Bug: #18986
Reported-by: Yaroslav Syrytsia <me@ys.lc>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18986-e7a8aac3d339fa47@postgresql.org
Backpatch-through: 17
|
|
TransactionIdIsActive() has not been used since bb38fb0d43c, in 2014. There
are no known uses in extensions either and it's hard to see valid uses for
it. Therefore remove TransactionIdIsActive().
Discussion: https://postgr.es/m/odgftbtwp5oq7cxjgf4kjkmyq7ypoftmqy7eqa7w3awnouzot6@hrwnl5tdqrgu
|
|
This commit adds the boilerplate code for supporting a list of
options in CHECKPOINT commands. No actual options are supported
yet, but follow-up commits will add support for MODE and
FLUSH_UNLOGGED. While at it, this commit refactors the code for
executing CHECKPOINT commands to its own function since it's about
to become significantly larger.
Author: Christoph Berg <myon@debian.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
|
|
The new name more accurately reflects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "fast" instead of "immediate", too. Likewise, references to
"immediate" checkpoints in the documentation have been updated to
say "fast". This is preparatory work for a follow-up commit that
will add a MODE option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
|
|
The new name more accurately relects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "flush-unlogged" instead of "flush-all", too. This is
preparatory work for a follow-up commit that will add a
FLUSH_UNLOGGED option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
|
|
Previously, the check_hook functions for max_slot_wal_keep_size and
idle_replication_slot_timeout would incorrectly raise an ERROR for values
set in postgresql.conf during upgrade, even though those values were not
actively used in the upgrade process.
To prevent logical slot invalidation during upgrade, we used to set
special values for these GUCs. Now, instead of relying on those values, we
directly prevent WAL removal and logical slot invalidation caused by
max_slot_wal_keep_size and idle_replication_slot_timeout.
Note: PostgreSQL 17 does not include the idle_replication_slot_timeout
GUC, so related changes were not backported.
BUG #18979
Reported-by: jorsol <jorsol@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed by: vignesh C <vignesh21@gmail.com>
Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us
Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
|
|
Previously, the idle_replication_slot_timeout parameter used minutes
as its unit, based on the assumption that values would typically exceed
one minute in production environments. However, this caused unexpected
behavior: specifying a value below 30 seconds would round down to 0,
effectively disabling the timeout. This could be surprising to users.
To allow finer-grained control and avoid such confusion, this commit changes
the unit of idle_replication_slot_timeout to seconds. Larger values can
still be specified easily using standard time suffixes, for example,
'24h' for 24 hours.
Back-patch to v18 where idle_replication_slot_timeout was added.
Reported-by: Gunnar Morling <gunnar.morling@googlemail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
|
|
This macro can be used to avoid compiler warnings, particularly when using -O3
and not using assertions, and to get the compiler to generate better code.
A subsequent commit introduces a first user.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3prdb6hkep3duglhsujrn52bkvnlkvhc54fzvph2emrsm4vodl@77yy6j4hkemb
Discussion: https://postgr.es/m/20230316172818.x6375uvheom3ibt2%40awork3.anarazel.de
Discussion: https://postgr.es/m/20240207203138.sknifhlppdtgtxnk%40awork3.anarazel.de
|
|
These are libc-specific functions, so should require a locale_t rather
than a pg_locale_t (which could use another provider).
Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel%40j-davis.com
|
|
This commit adds a new system view that provides information about
entries in the dynamic shared memory (DSM) registry. Specifically,
it returns the name, type, and size of each entry. Note that since
we cannot discover the size of dynamic shared memory areas (DSAs)
and hash tables backed by DSAs (dshashes) without first attaching
to them, the size column is left as NULL for those.
Bumps catversion.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Sungwoo Chang <swchangdev@gmail.com>
Discussion: https://postgr.es/m/4D445D3E-81C5-4135-95BB-D414204A0AB4%40gmail.com
|
|
The cpluspluscheck script wraps our headers in `extern "C"`. This
disables name mangling, which is necessary for the C++ templates
in system ICU headers. cpluspluscheck thus fails when the build is
configured with ICU (the default). CI worked around this by disabling
ICU, but let's make it work so others can run the script.
We can specify we only want the C APIs by defining U_SHOW_CPLUSPLUS_API
to be 0 in pg_locale.h. Extensions that want the C++ APIs can include
ICU headers separately before including PostgreSQL headers.
ICU documentation:
https://github.com/unicode-org/icu/blob/main/docs/processes/release/tasks/healthy-code.md#test-icu4c-headers
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220323002024.f2g6tivduzrktgfa%40alap3.anarazel.de
Discussion: https://postgr.es/m/CANWCAZbgiaz1_0-F4SD%2B%3D-e9onwAnQdBGJbhg94EqUu4Gb7WyA%40mail.gmail.com
|
|
By default io_uring creates a shared memory mapping for each io_uring
instance, leading to a large number of memory mappings. Unfortunately a large
number of memory mappings slows things down, backend exit is particularly
affected. To address that, newer kernels (6.5) support using user-provided
memory for the memory. By putting the relevant memory into shared memory we
don't need any additional mappings.
On a system with a new enough kernel and liburing, there is no discernible
overhead when doing a pgbench -S -C anymore.
Reported-by: MARK CALLAGHAN <mdcallag@gmail.com>
Reviewed-by: "Burd, Greg" <greg@burd.me>
Reviewed-by: Jim Nasby <jnasby@upgrade.com>
Discussion: https://postgr.es/m/CAFbpF8OA44_UG+RYJcWH9WjF7E3GA6gka3gvH6nsrSnEe9H0NA@mail.gmail.com
Backpatch-through: 18
|
|
For an ordered Append or MergeAppend, we need to inject an explicit
sort into any subpath that is not already well enough ordered.
Currently, only explicit full sorts are considered; incremental sorts
are not yet taken into account.
In this patch, for subpaths of an ordered Append or MergeAppend, we
choose to use explicit incremental sort if it is enabled and there are
presorted keys.
The rationale is based on the assumption that incremental sort is
always faster than full sort when there are presorted keys, a premise
that has been applied in various parts of the code. In addition, the
current cost model tends to favor incremental sort as being cheaper
than full sort in the presence of presorted keys, making it reasonable
not to consider full sort in such cases.
No backpatch as this could result in plan changes.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4_V7a2enTR+T3pOY_YZ-FU8ZsFYym2swOz4jNMqmSgyuw@mail.gmail.com
|
|
Functions to bootstrap and zero pages in various SLRU callers were
fairly duplicative. We can slash almost two hundred lines with a couple
of simple helpers:
- SimpleLruZeroAndWritePage: Does the equivalent of SimpleLruZeroPage
followed by flushing the page to disk
- XLogSimpleInsertInt64: Does a XLogBeginInsert followed by XLogInsert
of a trivial record whose data is just an int64.
Author: Evgeny Voropaev <evgeny.voropaev@tantorlabs.com>
Reviewed by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com
|
|
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages. Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding. This often led to confusion when comparing.
To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
|