| Age | Commit message (Collapse) | Author |
|
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call. That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil. Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index. There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM. Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.
(This is similar to the way we were already handling TableAmRoutine
structs. This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)
This commit needs to be called out in the v19 release notes as an API
change for extension index AMs. An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
|
|
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
This batch of changes includes most of the trivial changes suggested by
the author for src/backend/.
A total of 334 files are updated here. Among these files, 48 of them
have their build change slightly; these are caused by line number
changes as the new allocation formulas are simpler, shaving around 100
lines of code in total.
Similar work has been done in 0c3c5c3b06a3 and 31d3847a37be.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
|
|
Many of these are probably left over from before use of %zu/%zd was
portable.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
|
|
Plus a similar fix to the README.
Backpatch as far back as the sgml issue exists. The README issue does
exist in v14, but that seems unlikely to harm anyone.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
|
|
This removes some casts where the input already has the same type as
the type specified by the cast. Their presence could cause risks of
hiding actual type mismatches in the future or silently discarding
qualifiers. It also improves readability. Same kind of idea as
7f798aca1d5 and ef8fe693606. (This does not change all such
instances, but only those hand-picked by the author.)
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
Instead of complicated pointer arithmetic, overlay a uint32 array and
just access the array members. That's safe thanks to
XLogRecGetBlockData() returning a MAXALIGNed buffer.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
Several functions in the codebase accept "Datum *" parameters but do
not modify the pointed-to data. These have been updated to take
"const Datum *" instead, improving type safety and making the
interfaces clearer about their intent. This change helps the compiler
catch accidental modifications and better documents immutability of
arguments.
Most of "Datum *" parameters have a pairing "bool *isnull" parameter,
they are constified as well.
No functional behavior is changed by this patch.
Author: Chao Li <lic@highgo.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
|
|
This is a follow up 991295f. I searched over src/ and made all
ItemPointer arguments as const as much as possible.
Note: We cut out from the original patch the pieces that would have
created incompatibilities in the index or table AM APIs. Those could
be considered separately.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
|
|
This type is just char * underneath, it provides no real value, no
type safety, and just makes the code one level more mysterious. It is
more idiomatic to refer to blobs of memory by a combination of void *
and size_t, so change it to that.
Also, since this type hides the pointerness, we can't apply qualifiers
to what is pointed to, which requires some unconstify nonsense. This
change allows fixing that.
Extension code that uses the Item type can change its code to use
void * to be backward compatible.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
|
|
BufferGetPage() already returns type Page, so casting it to Page
doesn't achieve anything. A sizable number of call sites does this
casting; remove that.
This was already done inconsistently in the code in the first import
in 1996 (but didn't exist in the pre-1995 code), and it was then
apparently just copied around.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/CALdSSPgFhc5=vLqHdk-zCcnztC0zEY3EU_Q6a9vPEaw7FkE9Vw@mail.gmail.com
|
|
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes. This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).
The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins. The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level). Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).
Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:"). It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here). It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call). But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
|
|
Duplicate assignment in commit af4002b381d should have been a
different field. (But it didn't affect the outcome.)
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
|
|
After more discussion about commit ce62f2f2a0a, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering. Also improve the
documentation and update some comments that were previously missed.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
|
|
This reverts commit 5ead85fbc81162ab1594f656b036a22e814f96b3.
This commit shows test failures with debug_parallel_query=regress. The
underlying issue needs to be debugged, so revert for now.
|
|
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan nodes. This information is particularly useful
with scans that use ScalarArrayOp quals, where the number of index scans
isn't predictable in advance (at least not with optimizations like the
one added to nbtree by Postgres 17 commit 5bf748b8). It will also be
useful when EXPLAIN ANALYZE shows details of an nbtree index scan that
uses skip scan optimizations set to be introduced by an upcoming patch.
The instrumentation works by teaching index AMs to increment a new
nsearches counter whenever a new index search begins. The counter is
incremented at exactly the same point that index AMs must already
increment the index's pg_stat_*_indexes.idx_scan counter (we're counting
the same event, but at the scan level rather than the relation level).
The new counter is stored in the scan descriptor (IndexScanDescData),
which explain.c reaches by going through the scan node's PlanState.
This approach doesn't match the approach used when tracking other index
scan specific costs (e.g., "Rows Removed by Filter:"). It is similar to
the approach used in other cases where we must track costs that are only
readily accessible inside an access method, and not from the executor
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan). It is inherently
necessary to maintain a counter that can be incremented multiple times
during a single amgettuple call (or amgetbitmap call), and directly
exposing PlanState.instrument to index access methods seems unappealing.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
|
|
This was missed in the refactoring in patch ce62f2f2a0a, which thus
created a regression.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
|
|
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality. A field amcanorder
already existed, this uses it more widely. Fields amcanhash and
amcancrosscompare are added for the other purposes.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
The type argument wasn't actually really necessary. It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a016,
622f678c102). For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h). But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.
To simplify this, change the gist stategy support procedure to take
"any", "any" as argument. (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.) Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a016.
This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified. (It relied implicitly on
convention mentioned above.)
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two. This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
For each Index AM, provide a mapping between operator strategies and
the system-wide generic concept of a comparison type. For example,
for btree, BTLessStrategyNumber maps to and from COMPARE_LT. Numerous
places in the planner and executor think directly in terms of btree
strategy numbers (and a few in terms of hash strategy numbers.) These
should be converted over subsequent commits to think in terms of
CompareType instead. (This commit doesn't make any use of this API
yet.)
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
This refactors and simplifies various existing code to make use of the
new function.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
Consistently use "Size" (or size_t, or in some places int64 or double)
as the type for variables holding memory allocation sizes. In most
places variables' data types were fine already, but we had an ancient
habit of computing bytes from kilobytes-units GUCs with code like
"work_mem * 1024L". That risks overflow on Win64 where they did not
make "long" as wide as "size_t". We worked around that by restricting
such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB
on Win64. This patch removes that restriction, after replacing such
calculations with "work_mem * (Size) 1024" or variants of that.
It should be noted that this patch was constructed by searching
outwards from the GUCs that have MAX_KILOBYTES as upper limit.
So I can't positively guarantee there are no other places doing
memory-size arithmetic in int or long variables. I do however feel
pretty confident that increasing MAX_KILOBYTES on Win64 is safe now.
Also, nothing in our code should be dealing in multiple-gigabyte
allocations without authorization from a relevant GUC, so it seems
pretty likely that this search caught everything that could be at
risk of overflow.
Author: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
|
|
Backpatch-through: 13
|
|
Many of them just seem to have been copied around for no real reason.
Their presence causes (small) risks of hiding actual type mismatches
or silently discarding qualifiers
Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
|
|
When we are building a hash index that is large enough to need
pre-sorting (larger than either maintenance_work_mem or NBuffers),
the initial sorting phase is interruptible, but the insertion
phase wasn't. Add the missing CHECK_FOR_INTERRUPTS().
Per bug #18616 from Alexander Lakhin. Back-patch to all
supported branches.
Pavel Borisov
Discussion: https://postgr.es/m/18616-acbb9e5caf41e964@postgresql.org
|
|
Remove redundant checks for locale->collate_is_c now that we always
have a valid pg_locale_t.
Also, remove pg_locale_deterministic() wrapper, which is no longer
useful after commit e9931bfb75. Just check the field directly,
consistent with other fields in pg_locale_t.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
|
|
hashvalidate(), which validates the signatures of support functions
for the hash AM, contained several hardcoded exceptions. For example,
hash/date_ops support function 1 was hashint4(), which would
ordinarily fail validation because the function argument is int4, not
date. But this works internally because int4 and date are of the same
size. There are several more exceptions like this that happen to work
and were allowed historically but would now fail the function
signature validation.
This patch removes those exceptions by providing new support functions
that have the proper declared signatures. They internally share most
of the code with the "wrong" functions they replace, so the behavior
is still the same.
With the exceptions gone, hashvalidate() is now simplified and relies
fully on check_amproc_signature().
hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat
because some extensions currently use them to build hash functions for
their own types, and we need to keep exposing these functions as
"LANGUAGE internal" functions for that to continue to work.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org
|
|
The index access methods all had similar code that copied the
passed-in scan keys to local storage. They all used memmove() for
that, which is not wrong, but it seems confusing not to use memcpy()
when that would work. Presumably, this was all once copied from
ancient code and never adjusted.
Discussion: https://www.postgresql.org/message-id/flat/f8c739d9-f48d-4187-b214-df3391ba41ab@eisentraut.org
|
|
The only current implementation is for btree where it calls
_bt_getrootheight(). Other index types can now also use this to pass
information to their amcostestimate routine. Previously, btree was
hardcoded and other index types could not hook into the optimizer at
this point.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
Previously, passing NULL for pg_locale_t meant "use the libc provider
and the server environment". Now that the database collation is
represented as a proper pg_locale_t (not dependent on setlocale()),
remove special cases for NULL.
Leave wchar2char() and char2wchar() unchanged for now, because the
callers don't always have a libc-based pg_locale_t available.
Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org
Reviewed-by: Peter Eisentraut, Andreas Karlsson
|
|
strxfrm() is not guaranteed to return the exact number of bytes needed
to store the result; it may return a higher value.
Discussion: https://postgr.es/m/32f85d88d1f64395abfe5a10dd97a62a4d3474ce.camel@j-davis.com
Reviewed-by: Heikki Linnakangas
Backpatch-through: 16
|
|
aa5edbe379d6 has tweaked _hash_freeovflpage() so as the write buffer's
LSN is updated only when necessary, when REGBUF_NO_CHANGE is not used.
The replay code was not consistent with that, causing the write buffer's
LSN to be updated and its page to be marked as dirty even if the buffer
was registered in a "clean" state. This was possible for the case of a
squeeze record when there are no tuples to add to the write buffer, for
(is_prim_bucket_same_wrt && !is_prev_bucket_same_wrt).
I have performed some validation of this commit with
wal_consistency_checking and a change in WAL that logs REGBUF_NO_CHANGE
to a new BKPIMAGE_*. Thanks to that, it is possible to know at replay
if a buffer was clean when it was registered, then cross-checked the LSN
of the "clean" page copy coming from WAL with the LSN of the block once
the record has been replayed. This eats one bit in bimg_info, which is
not acceptable to be integrated as-is, but it could become handy in the
future. I didn't spot other areas than the one fixed by this commit at
the extent of what the main regression test suite covers.
As this is an oversight in aa5edbe379d6, no backpatch is required.
Reported-by: Zubeyr Eryilmaz
Author: Hayato Kuroda
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
|
|
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.
The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
|
|
Similar to commit 7e735035f20.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
|
|
Oversight in commit c2fe139c20.
|
|
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
Commit 861f86beea used REGBUF_NO_CHANGE at one of the places in the hash
index to register the clean buffers but forgot to avoid setting LSN in
that case.
Reported-by: Michael Paquier
Author: Kuroda Hayato
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.
The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.
The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).
This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.
Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
|
|
Commit 861f86beea changed hash_xlog_squeeze_page() to start reading
the write buffer conditionally but forgot to initialize it leading to an
uninitialized access.
Reported-by: Alexander Lakhin
Author: Hayato Kuroda
Reviewed-by: Alexander Lakhin, Amit Kapila
Discussion: http://postgr.es/m/62ed1a9f-746a-8e86-904b-51b9b806a1d9@gmail.com
|
|
The brininsert code used to initialize (and destroy) BrinDesc and
BrinRevmap for each tuple, which is not free. This patch initializes
these structures only once, and reuses them for all inserts in the same
command. The data is passed through indexInfo->ii_AmCache.
This also introduces an optional AM callback "aminsertcleanup" that
allows performing custom cleanup in case simply pfree-ing ii_AmCache is
not sufficient (which is the case when the cache contains TupleDesc,
Buffers, and so on).
Author: Soumyadeep Chakraborty
Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra
Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
|
|
Commit 00d7fb5e2e started to use REGBUF_NO_CHANGE at a few places in the
code where we register the buffer before marking it dirty but missed
updating one of the code flows in the hash index where we free the overflow
page without any live tuples on it.
Author: Amit Kapila and Hayato Kuroda
Discussion: http://postgr.es/m/f045c8f7-ee24-ead6-3679-c04a43d21351@gmail.com
|
|
Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.
Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.
Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a904@iki.fi
Reviewed-by: Heikki Linnakangas
|
|
This excludes any changes that would change the external AM APIs.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org
|
|
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability. We agreed to remove it, after a long period
without an active plan to fix it.
This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
|
|
Commit 31966b15 invented a way for functions dealing with relation
extension to accept a Relation in online code and an SMgrRelation in
recovery code. It seems highly likely that future bufmgr.c interfaces
will face the same problem, and need to do something similar.
Generalize the names so that each interface doesn't have to re-invent
the wheel.
Back-patch to 16. Since extension AM authors might start using the
constructor macros once 16 ships, we agreed to do the rename in 16
rather than waiting for 17.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com
|
|
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
|
|
During WAL replay on the standby, when a conflict with a logical slot is
identified, invalidate such slots. There are two sources of conflicts:
1) Using the information added in 6af1793954e, logical slots are invalidated if
required rows are removed
2) wal_level on the primary server is reduced to below logical
Uses the infrastructure introduced in the prior commit. FIXME: add commit
reference.
Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to
interrupt use of a slot, if called in the startup process. The new recovery
conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot.
See 6af1793954e for an overall design of logical decoding on a standby.
Bumps catversion for the addition of the pg_stat_database_conflicts column.
Bumps PGSTAT_FILE_FORMAT_ID for the same reason.
Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
|
|
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned. The exact requirements vary by OS and file system (typically
sectors and/or memory pages). The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size. There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.
Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.
Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer
pool is also aligned in shared memory.
WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.
BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).
If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0. This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead. That's an existing and tested class of system.
Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
|