| Age | Commit message (Collapse) | Author |
|
We only need to do it for WAIT_LSN_TYPE_REPLAY. WAIT_LSN_TYPE_FLUSH can work
for both primary and follower.
|
|
Now that commit 06edbed47862 has introduced XLogRecPtrIsValid(), we can
use that instead of:
- XLogRecPtrIsInvalid()
- direct comparisons with InvalidXLogRecPtr
- direct comparisons with literal 0
This makes the code more consistent.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
|
|
Fix a typo, add some missing conjunctions, and make a sentence flow
more smoothly.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CA%2BrenyXZgwzegmO5t%3DUSU%3D9Wo5bc-YqNf-6E7Nv7e577DCmYXA%40mail.gmail.com
|
|
Various places that were using StringInfo but didn't need that
StringInfo to exist beyond the scope of the function were using
makeStringInfo(), which allocates both a StringInfoData and the buffer it
uses as two separate allocations. It's more efficient for these cases to
use a StringInfoData on the stack and initialize it with initStringInfo(),
which only allocates the string buffer. This also simplifies the cleanup,
in a few cases.
Author: Mats Kindahl <mats.kindahl@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com
|
|
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality. It's not possible to implement this as
a function. Previous experience shows that stored procedures also have
limitation in this aspect.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
|
|
Implement a new facility that allows processes to wait for WAL to reach
specific LSNs, both on primary (waiting for flush) and standby (waiting
for replay) servers.
The implementation uses shared memory with per-backend information
organized into pairing heaps, allowing O(1) access to the minimum
waited LSN. This enables fast-path checks: after replaying or flushing
WAL, the startup process or WAL writer can quickly determine if any
waiters need to be awakened.
Key components:
- New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush()
- Separate pairing heaps for replay and flush waiters
- WaitLSN lightweight lock for coordinating shared state
- Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring
This infrastructure can be used by features that need to wait for WAL
operations to complete.
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
|
|
These assertions may prove to become useful to make sure that no process
other than the startup process calls the routines where these checks are
added, as we expect that these do not interfere with a WAL receiver
switched to a "stopping" state by a startup process.
The assumption that only the startup process can use this code has
existed for many years, without a check enforcing it.
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/aQmGeVLYl51y1m_0@paquier.xyz
|
|
The parallel GIN builds perform "freezing" of TID lists when merging
chunks built earlier. This means determining what part of the list can
no longer change, depending on the last received chunk. The frozen part
can be evicted from memory and written out.
The code attempted to freeze items right before merging the old and new
TID list, after already attempting to trim the current buffer. That
means part of the data may get frozen based on the new TID list, but
will be trimmed later (on next loop). This increases memory usage.
This inverts the order, so that we freeze data first (before trimming).
The benefits are likely relatively small, but it's also virtually free
with no other downsides.
Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
|
|
When building intermediate TID lists during parallel GIN builds, split
the sorted lists into smaller chunks, to limit the amount of memory
needed when merging the chunks later.
The leader may need to keep in memory up to one chunk per worker, and
possibly one extra chunk (before evicting some of the data). The code
processing item pointers uses regular palloc/repalloc calls, which means
it's subject to the MaxAllocSize (1GB) limit.
We could fix this by allowing huge allocations, but that'd require
changes in many places without much benefit. Larger chunks do not
actually improve performance, so the memory usage would be wasted.
Fixed by limiting the chunk size to not hit MaxAllocSize. Each worker
gets a fair share.
This requires remembering the number of participating workers, in a
place that can be accessed from the callback. Luckily, the bs_worker_id
field in GinBuildState was unused, so repurpose that.
Report by Greg Smith, investigation and fix by me. Batchpatched to 18,
where parallel GIN builds were introduced.
Reported-by: Gregory Smith <gregsmithpgsql@gmail.com>
Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
Backpatch-through: 18
|
|
Reported-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2kAkNaDa01O0nKsQmkfEmxsDvm09SU=f1T0CV8ew3qJEA@mail.gmail.com
|
|
When the startup process switches from streaming to archive as WAL
source, we avoid calling ShutdownWalRcv() if the WAL receiver is not
streaming, based on WalRcvStreaming(). WALRCV_STOPPING is a state set
by ShutdownWalRcv(), called only by the startup process, meaning that it
should not be possible to reach this state while in
WaitForWALToBecomeAvailable().
This commit adds an assertion to make sure that a WAL receiver is never
in a WALRCV_STOPPING state should the startup process attempt to reset
InstallXLogFileSegmentActive.
Idea suggested by Noah Misch.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
|
|
Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of
3635a0a35aaf) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources. This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.
The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.
This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2a1: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process. WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path. For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.
A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.
Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
|
|
The new wal_fpi_bytes counter calculates the total amount of full page
images inserted in WAL records, in bytes. This commit adds this
information to VACUUM and ANALYZE logs alongside the existing counters,
building upon f9a09aa29520.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aQMMSSlFXy4Evxn3@paquier.xyz
|
|
Add comments explaining when and where it is safe for nbtree to treat
row compare keys as if they were simple scalar inequality keys on the
row's most significant column. This is particularly important within
_bt_advance_array_keys, which deals with required inequality keys in a
general and uniform way, without any special handling for row compares.
Also spell out the implications of _bt_check_rowcompare's approach of
_conditionally_ evaluating lower-order row compare subkeys, particularly
when one of its lower-order subkeys might see NULL index tuple values
(these may or may not affect whether the qual as a whole is satisfied).
The behavior in this area isn't particularly intuitive, so these issues
seem worth going into.
In passing, add a few more defensive/documenting row comparison related
assertions to _bt_first and _bt_check_rowcompare.
Follow-up to commits bd3f59fd and ec986020.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wznwkak_K7pcAdv9uH8ZfNo8QO7+tHXOaCUddMeTfaCCFw@mail.gmail.com
Backpatch-through: 18
|
|
_bt_first reliably uses the same equality key (on each index column) for
initial positioning purposes as the one that _bt_checkkeys can use to
end the scan following commit f09816a0. _bt_first no longer applies its
own independent rules to determine which initial positioning key to use
on each column (for equality and inequality keys alike). Preprocessing
is now fully in control of determining which keys start and end each
scan, ensuring that _bt_first and _bt_checkkeys have symmetric behavior.
Remove obsolete comments that described why _bt_first was expected to
use at least one of the available required equality keys for initial
positioning purposes. The rules in this area are now maximally strict
and uniform, so there's no reason to draw attention to equality keys.
Any column with a required equality key cannot have a redundant required
inequality key (nor can it have a redundant required equality key).
Oversight in commit f09816a0, which removed similar comments from
_bt_first, but missed these comments.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
|
|
Several functions in the codebase accept "Datum *" parameters but do
not modify the pointed-to data. These have been updated to take
"const Datum *" instead, improving type safety and making the
interfaces clearer about their intent. This change helps the compiler
catch accidental modifications and better documents immutability of
arguments.
Most of "Datum *" parameters have a pairing "bool *isnull" parameter,
they are constified as well.
No functional behavior is changed by this patch.
Author: Chao Li <lic@highgo.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
|
|
This is a follow up 991295f. I searched over src/ and made all
ItemPointer arguments as const as much as possible.
Note: We cut out from the original patch the pieces that would have
created incompatibilities in the index or table AM APIs. Those could
be considered separately.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
|
|
There are a few places where we have
typedef struct FooData { ... } FooData;
typedef FooData *Foo;
and then function declarations with
bar(const Foo x)
which isn't incorrect but probably meant
bar(const FooData *x)
meaning that the thing x points to is immutable, not x itself.
This patch makes those changes where appropriate. In one
case (execGrouping.c), the thing being pointed to was not immutable,
so in that case remove the const altogether, to avoid further
confusion.
Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
|
|
XLogRecordAssemble() may be called multiple times before inserting a
record in XLogInsertRecord(), and the amount of FPIs generated inside
a record whose insertion is attempted multiple times may vary.
The logic added in f9a09aa29520 touched directly pgWalUsage in
XLogRecordAssemble(), meaning that it could be possible for pgWalUsage
to be incremented multiple times for a single record. This commit
changes the code to use the same logic as the number of FPIs added to a
record, where XLogRecordAssemble() returns this information and feeds it
to XLogInsertRecord(), updating pgWalUsage only when a record is
inserted.
Reported-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com
|
|
This new counter, called "wal_fpi_bytes", tracks the total amount in
bytes of full page images (FPIs) generated in WAL. This data becomes
available globally via pg_stat_wal, and for backend statistics via
pg_stat_get_backend_wal().
Previously, this information could only be retrieved with pg_waldump or
pg_walinspect, which may not be available depending on the environment,
and are expensive to execute. It offers hints about how much FPIs
impact the WAL generated, which could be a large percentage for some
workloads, as well as the effects of wal_compression or page holes.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in
PgStat_WalCounters.
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
|
|
Add some const qualifications afforded by the previous change that
added a const qualification to PageAddItemExtended().
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
|
|
This type is just char * underneath, it provides no real value, no
type safety, and just makes the code one level more mysterious. It is
more idiomatic to refer to blobs of memory by a combination of void *
and size_t, so change it to that.
Also, since this type hides the pointerness, we can't apply qualifiers
to what is pointed to, which requires some unconstify nonsense. This
change allows fixing that.
Extension code that uses the Item type can change its code to use
void * to be backward compatible.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
|
|
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
|
|
5983a4cff added CompactAttribute for storing commonly used fields from
FormData_pg_attribute. 5983a4cff didn't go to the trouble of adjusting
every location where we can use CompactAttribute rather than
FormData_pg_attribute, so here we change the remaining ones.
There are some locations where I've left the code using
FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using
CompactAttribute here seems more risky as often the TupleDesc is being
changed and those changes may not have been flushed to the
CompactAttribute yet.
I've also left record_recv(), record_send(), record_cmp(), record_eq()
and record_image_eq() alone as it's not clear to me that accessing the
CompactAttribute is a win here due to the FormData_pg_attribute still
having to be accessed for most cases. Switching the relevant parts to
use CompactAttribute would result in having to access both for common
cases. Careful benchmarking may reveal that something can be done to
make this better, but in absence of that, the safer option is to leave
these alone.
In ReorderBufferToastReplace(), there was a check to skip attnums < 0
while looping over the TupleDesc. Doing this is redundant since
TupleDescs don't store < 0 attnums. Removing that code allows us to
move to using CompactAttribute.
The change in validateDomainCheckConstraint() just moves fetching the
FormData_pg_attribute into the ERROR path, which is cold due to calling
errstart_cold() and results in code being moved out of the common path.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
|
|
Backpatch-through: 13
|
|
A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange
to when the table contains close to 2^32 pages. At worst, this could
result in a cancellable infinite loop during the BRIN index scan with
power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and
scanning of unneeded heap blocks for non power-of-2 pagesPerRange.
Backpatch to all supported versions.
Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com
Backpatch-through: 13
|
|
The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.
Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
|
|
This function only requires a few fields from LVRelState, so pass them
in individually.
This change allows calling heap_page_is_all_visible() from code such as
pruneheap.c, which does not have access to an LVRelState.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
|
|
These functions appeared prominently in a profile of a patch that sets
the visibility map on-access. Inline them to remove call overhead and
make them cheaper to use in hot paths.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
|
|
After scanning the line pointers on a heap page during the first phase
of vacuum, we use the information collected to decide whether to use
the assembled freeze plans.
Move this decision logic into a helper function to improve readability.
While here, rename a PruneState member and disambiguate some local
variables in heap_page_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
|
|
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
|
|
Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
|
|
An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.
Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13
|
|
Issue found while browsing this area of the code, introduced and
copy-pasted around by 2258e76f90bf.
Backpatch-through: 15
|
|
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
|
|
We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().
The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.
Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
|
|
The following information is added in the description of some GIN
records:
- In INSERT_LISTPAGE, the number of tuples and the right link block.
- In UPDATE_META_PAGE, the number of tuples, the previous tail block,
and the right link block.
- In SPLIT, the left and right children blocks.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
|
|
Reported-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+TgmoZYh_nw-2j_Fi9y6ZAvrpN+W1aSOFNM7Rus2Q-zTkCsQw@mail.gmail.com
|
|
Previously, we attempted to form a posting list tuple even when
ginCompressPostingList() failed to compress the posting list due to
its size. While there was no functional failure, it always wasted one
GinFormTuple() call when item pointers didn't fit in a posting list
tuple.
This commit ensures that a GIN index tuple is formed only when all
item pointers in the posting list are successfully compressed.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAE7r3M+C=jcpTD93f_RBHrQp3C+=TAXFs+k4tTuZuuxboK8AvA@mail.gmail.com
|
|
The WAL records XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE
included some information about the blocks added to the record.
This information is already provided by XLogRecGetBlockRefInfo() with
much more details about the blocks included in each record, like the
compression information, for example. This commit removes the block
information that existed in the record descriptions specific to GIN.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
|
|
This is not at all needed; I suspect it was a simple mistake in commit
5408e233f066. It causes htup_details.h to bleed into a huge number of
places via execnodes.h. Remove it and fix fallout.
Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
|
|
These headers don't need execnodes.h for anything. I think they never
have.
Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
|
|
In similar vein to commit ccc8194e427, a reset instance of a shared
memory TID store happened to occupy the same private memory as the old
one for the entry point, since the chunk freed after the last round
of index vacuuming was put on the context's freelist. The failure
to update the vacrel->dead_items pointer was evident by nudging the
system to allocate memory in a different area. This was not discovered
at the time of the earlier commit since our regression tests didn't
cover multiple index passes with parallel vacuum.
Backpatch to v17, when TidStore came in.
Author: Kevin Oommen Anish <kevin.o@zohocorp.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Tested-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/199a07cbdfc.7a1c4aac25838.1675074408277594551%40zohocorp.com
Backpatch-through: 17
|
|
Oversight in 2c03216d8311, when the redo code of GIN got refactored for
the new WAL format where block information has been standardized, as the
payload data got tracked for each block after the change, and not in the
whole record. This is just a cleanup.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
|
|
During recovery, XLogNeedsFlush() checks the minimum recovery LSN point
instead of the flush LSN point. The same condition checks are used when
updating the minimum recovery point in UpdateMinRecoveryPoint(), but are
written in reverse order.
This commit makes the order of the checks consistent between
XLogNeedsFlush() and UpdateMinRecoveryPoint(), improving the code
clarity. Note that the second check (as ordered by this commit) relies
on InRecovery, which is true only in the startup process. So this makes
XLogNeedsFlush() cheaper in the startup process with the first check
acting as a shortcut while doing crash recovery, where
LocalMinRecoveryPoint is an invalid LSN.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/aMIHNRTP6Wj6vw1s%40paquier.xyz
|
|
This improves the stability of VACUUM when processing btree indexes,
which was previously able to trigger an assertion failure in
_bt_lock_subtree_parent() when an error was previously thrown outside
the scope of _bt_split() when splitting a btree page. VACUUM would
consider the index as in a corrupted state as the right page would not
be zeroed for the error thrown (allocation failure is one pattern).
In a non-assert build, VACUUM is able to succeed, reporting what it sees
as a corruption while attempting to fix the index. This would manifest
as a LOG message, as of:
LOG: failed to re-find parent key in index "idx" for deletion target
page N
CONTEXT: while vacuuming index "idx" of relation "public.tab"
This commit improves the code to rely on two PGAlignedBlocks that are
used as a temporary space for the left and right pages. The main change
concerns the right page, whose contents are now copied into the
"temporary" PGAlignedBlock page while its original space is zeroed. Its
contents are moved from the PGAlignedBlock page back to the page once we
enter in the critical section used for the split. This simplifies the
split logic, as it is not necessary to zero the right page before
throwing an error anymore. Hence errors can now be thrown outside the
split code. For the left page, this shaves one allocation, with
PageGetTempPage() being previously used.
The previous logic originates from commit 8fa30f906b, at a point where
PGAlignedBlock did not exist yet. This could be argued as something
that should be backpatched, but the lack of complaints indicates that it
may not be necessary.
Author: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/566dacaf-5751-47e4-abc6-73de17a5d42a@garret.ru
|
|
When defining an injection point there is no need to wrap the definition
with USE_INJECTION_POINT guards, the INJECTION_POINT macro is available
in all builds. Remove to make the code consistent.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB14966C8015DEB05ABEF2CE077F51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
|
|
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
|
|
As commit d4d1fc527bdb.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql
|
|
f83d709760d8 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
|