summaryrefslogtreecommitdiff
path: root/src/backend/access
AgeCommit message (Collapse)Author
8 daysReplace internal C function pg_hypot() by standard hypot()Peter Eisentraut
The code comment said, "It is expected that this routine will eventually be replaced with the C99 hypot() function.", so let's do that now. This function is tested via the geometry regression test, so if it is faulty on any platform, it will show up there. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
8 daysAssert that cutoffs are provided if freezing will be attemptedMelanie Plageman
heap_page_prune_and_freeze() requires the caller to initialize PruneFreezeParams->cutoffs so that the function can correctly evaluate whether tuples should be frozen. This requirement previously existed only in comments and was easy to miss, especially after “cutoffs” was converted from a direct function parameter to a field of the newly introduced PruneFreezeParams struct (added in 1937ed70621). Adding an assert makes this requirement explicit and harder to violate. Also, fix a minor typo while we're at it. Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/0AC177F5-5E26-45EE-B273-357C51212AC5%40gmail.com
13 daysC11 alignas instead of unionsPeter Eisentraut
This changes a few union members that only existed to ensure alignments and replaces them with the C11 alignas specifier. This change only uses fundamental alignments (meaning approximately alignments of basic types), which all C11 compilers must support. There are opportunities for similar changes using extended alignments, for example in PGIOAlignedBlock, but these are not necessarily supported by all compilers, so they are kept as a separate change. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
13 daysSplit PruneFreezeParams initializers to one field per lineMelanie Plageman
This conforms more closely with the style of other struct initializers in the code base. Initializing multiple fields on a single line is unpopular in part because pgindent won't permit a space after the comma before the next field's period. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87see87fnq.fsf%40wibble.ilmari.org
13 daysUpdate PruneState.all_[visible|frozen] earlier in pruningMelanie Plageman
During pruning and freezing in phase I of vacuum, we delay clearing all_visible and all_frozen in the presence of dead items. This allows opportunistic freezing if the page would otherwise be fully frozen, since those dead items are later removed in vacuum phase III. To move the VM update into the same WAL record that prunes and freezes tuples, we must know whether the page will be marked all-visible/all-frozen before emitting WAL. Previously we waited until after emitting WAL to update all_visible/all_frozen to their correct values. The only barrier to updating these flags immediately after deciding whether to opportunistically freeze was that while emitting WAL for a record freezing tuples, we use the pre-corrected value of all_frozen to compute the snapshot conflict horizon. By determining the conflict horizon earlier, we can update the flags immediately after making the opportunistic freeze decision. This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record emitted by pruning and freezing. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
13 daysKeep all_frozen updated in heap_page_prune_and_freezeMelanie Plageman
Previously, we relied on all_visible and all_frozen being used together to ensure that all_frozen was correct, but it is better to keep both fields updated. Future changes will separate their usage, so we should not depend on all_visible for the validity of all_frozen. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
13 daysRefactor heap_page_prune_and_freeze() parameters into a structMelanie Plageman
heap_page_prune_and_freeze() had accumulated an unwieldy number of input parameters and upcoming work to handle VM updates in this function will add even more. Introduce a new PruneFreezeParams struct to group the function’s input parameters, improving readability and maintainability. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
2025-11-18Optimize shared memory usage for WaitLSNProcInfoAlexander Korotkov
We need separate pairing heaps for different WaitLSNType's, because there might be waiters for different LSN's at the same time. However, one process can wait only for one type of LSN at a time. So, no need for inHeap and heapNode fields to be arrays. Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-17Use streaming read I/O in BRIN vacuum scan.Masahiko Sawada
This commit implements streaming read I/O for BRIN vacuum scans. Although BRIN indexes tend to be relatively small by design, performance tests have shown performance improvements. Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://postgr.es/m/CAE7r3ML01aiq9Th_1OSz7U7Aq2pWbhMLoz5T%2BPXcg8J9ZAPFFA%40mail.gmail.com
2025-11-15Fix WaitLSNWakeup() fast-path check for InvalidXLogRecPtrAlexander Korotkov
WaitLSNWakeup() incorrectly returned early when called with InvalidXLogRecPtr (meaning "wake all waiters"), because the fast-path check compared minWaitedLSN > 0 without validating currentLSN first. This caused WAIT FOR LSN commands to wait indefinitely during standby promotion until random signals woke them. Add an XLogRecPtrIsValid() check before the comparison so that InvalidXLogRecPtr bypasses the fast-path and wakes all waiters immediately. Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-11-13Add some missing #include <limits.h>.Thomas Munro
These files relied on transitive inclusion via port/atomics.h for constants CHAR_BIT and INT_MAX. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/536409d2-c9df-4ef3-808d-1ffc3182868c@iki.fi
2025-11-13Replace off_t by pgoff_t in I/O routinesMichael Paquier
PostgreSQL's Windows port has never been able to handle files larger than 2GB due to the use of off_t for file offsets, only 32-bit on Windows. This causes signed integer overflow at exactly 2^31 bytes when trying to handle files larger than 2GB, for the routines touched by this commit. Note that large files are forbidden by ./configure (3c6248a828af) and meson (recent change, see 79cd66f28c65). This restriction also exists in v16 and older versions for the now-dead MSVC scripts. The code base already defines pgoff_t as __int64 (64-bit) on Windows for this purpose, and some function declarations in headers use it, but many internals still rely on off_t. This commit switches more routines to use pgoff_t, offering more portability, for areas mainly related to file extensions and storage. These are not critical for WAL segments yet, which have currently a maximum size allowed of 1GB (well, this opens the door at allowing a larger size for them). This matters more for segment files if we want to lift the large file restriction in ./configure and meson in the future, which would make sense to remove once/if all traces of off_t are gone from the tree. This can additionally matter for out-of-core code that may want files larger than 2GB in places where off_t is four bytes in size. Note that off_t is still used in other parts of the tree like buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc. These other code paths can be addressed separately, and their update will be required if we want to remove the large file restriction in the future. This commit is a good first cut in itself towards more portability, hopefully. On Unix-like systems, pgoff_t is defined as off_t, so this change only affects Windows behavior. Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
2025-11-12Prefer spelling "cacheable" over "cachable".Thomas Munro
Previously we had both in code and comments. Keep the more common and accepted variant. Author: Chao Li <lic@highgo.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/5EBF1771-0566-4D08-9F9B-CDCDEF4BDC98@gmail.com
2025-11-10Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.hHeikki Linnakangas
It seems plausible that someone might want to experiment with different values. The pressing reason though is that I'm reviewing a patch that requires pg_upgrade to manipulate SLRU files. That patch needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be included from frontend code. Moving it to pg_config_manual.h makes it accessible. Now that it's a little more likely that someone might change SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it. Bump catalog version because of the new field in the control file. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi
2025-11-07Fix checking for recovery state in WaitForLSN()Alexander Korotkov
We only need to do it for WAIT_LSN_TYPE_REPLAY. WAIT_LSN_TYPE_FLUSH can work for both primary and follower.
2025-11-06Use XLogRecPtrIsValid() in various placesÁlvaro Herrera
Now that commit 06edbed47862 has introduced XLogRecPtrIsValid(), we can use that instead of: - XLogRecPtrIsInvalid() - direct comparisons with InvalidXLogRecPtr - direct comparisons with literal 0 This makes the code more consistent. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-06Cosmetic fixes in GiST READMEJohn Naylor
Fix a typo, add some missing conjunctions, and make a sentence flow more smoothly. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://postgr.es/m/CA%2BrenyXZgwzegmO5t%3DUSU%3D9Wo5bc-YqNf-6E7Nv7e577DCmYXA%40mail.gmail.com
2025-11-06Use stack allocated StringInfoDatas, where possibleDavid Rowley
Various places that were using StringInfo but didn't need that StringInfo to exist beyond the scope of the function were using makeStringInfo(), which allocates both a StringInfoData and the buffer it uses as two separate allocations. It's more efficient for these cases to use a StringInfoData on the stack and initialize it with initStringInfo(), which only allocates the string buffer. This also simplifies the cleanup, in a few cases. Author: Mats Kindahl <mats.kindahl@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com
2025-11-05Implement WAIT FOR commandAlexander Korotkov
WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05Add infrastructure for efficient LSN waitingAlexander Korotkov
Implement a new facility that allows processes to wait for WAL to reach specific LSNs, both on primary (waiting for flush) and standby (waiting for replay) servers. The implementation uses shared memory with per-backend information organized into pairing heaps, allowing O(1) access to the minimum waited LSN. This enables fast-path checks: after replaying or flushing WAL, the startup process or WAL writer can quickly determine if any waiters need to be awakened. Key components: - New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush() - Separate pairing heaps for replay and flush waiters - WaitLSN lightweight lock for coordinating shared state - Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring This infrastructure can be used by features that need to wait for WAL operations to complete. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05Add assertions checking for the startup process in WAL replay routinesMichael Paquier
These assertions may prove to become useful to make sure that no process other than the startup process calls the routines where these checks are added, as we expect that these do not interfere with a WAL receiver switched to a "stopping" state by a startup process. The assumption that only the startup process can use this code has existed for many years, without a check enforcing it. Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/aQmGeVLYl51y1m_0@paquier.xyz
2025-11-04Trim TIDs during parallel GIN builds more eagerlyTomas Vondra
The parallel GIN builds perform "freezing" of TID lists when merging chunks built earlier. This means determining what part of the list can no longer change, depending on the last received chunk. The frozen part can be evicted from memory and written out. The code attempted to freeze items right before merging the old and new TID list, after already attempting to trim the current buffer. That means part of the data may get frozen based on the new TID list, but will be trimmed later (on next loop). This increases memory usage. This inverts the order, so that we freeze data first (before trimming). The benefits are likely relatively small, but it's also virtually free with no other downsides. Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
2025-11-04Limit the size of TID lists during parallel GIN buildTomas Vondra
When building intermediate TID lists during parallel GIN builds, split the sorted lists into smaller chunks, to limit the amount of memory needed when merging the chunks later. The leader may need to keep in memory up to one chunk per worker, and possibly one extra chunk (before evicting some of the data). The code processing item pointers uses regular palloc/repalloc calls, which means it's subject to the MaxAllocSize (1GB) limit. We could fix this by allowing huge allocations, but that'd require changes in many places without much benefit. Larger chunks do not actually improve performance, so the memory usage would be wasted. Fixed by limiting the chunk size to not hit MaxAllocSize. Each worker gets a fair share. This requires remembering the number of participating workers, in a place that can be accessed from the callback. Luckily, the bs_worker_id field in GinBuildState was unused, so repurpose that. Report by Greg Smith, investigation and fix by me. Batchpatched to 18, where parallel GIN builds were introduced. Reported-by: Gregory Smith <gregsmithpgsql@gmail.com> Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com Backpatch-through: 18
2025-11-04Remove redundant memset() introduced by a0942f4.Jeff Davis
Reported-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2kAkNaDa01O0nKsQmkfEmxsDvm09SU=f1T0CV8ew3qJEA@mail.gmail.com
2025-11-04Add assertion check for WAL receiver state during stream-archive transitionMichael Paquier
When the startup process switches from streaming to archive as WAL source, we avoid calling ShutdownWalRcv() if the WAL receiver is not streaming, based on WalRcvStreaming(). WALRCV_STOPPING is a state set by ShutdownWalRcv(), called only by the startup process, meaning that it should not be possible to reach this state while in WaitForWALToBecomeAvailable(). This commit adds an assertion to make sure that a WAL receiver is never in a WALRCV_STOPPING state should the startup process attempt to reset InstallXLogFileSegmentActive. Idea suggested by Noah Misch. Author: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
2025-11-04Fix unconditional WAL receiver shutdown during stream-archive transitionMichael Paquier
Commit b4f584f9d2a1 (affecting v15~, later backpatched down to 13 as of 3635a0a35aaf) introduced an unconditional WAL receiver shutdown when switching from streaming to archive WAL sources. This causes problems during a timeline switch, when a WAL receiver enters WALRCV_WAITING state but remains alive, waiting for instructions. The unconditional shutdown can break some monitoring scenarios as the WAL receiver gets repeatedly terminated and re-spawned, causing pg_stat_wal_receiver.status to show a "streaming" instead of "waiting" status, masking the fact that the WAL receiver is waiting for a new TLI and a new LSN to be able to continue streaming. This commit changes the WAL receiver behavior so as the shutdown becomes conditional, with InstallXLogFileSegmentActive being always reset to prevent the regression fixed by b4f584f9d2a1: only terminate the WAL receiver when it is actively streaming (WALRCV_STREAMING, WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state, just reset InstallXLogFileSegmentActive flag to allow archive restoration without killing the process. WALRCV_STOPPED and WALRCV_STOPPING are not reachable states in this code path. For the latter, the startup process is the one in charge of setting WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to reach a WALRCV_STOPPED state after switching walRcvState, so WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is in a WALRCV_STOPPING state. A regression test is added to check that a WAL receiver is not stopped on timeline jump, that fails when the fix of this commit is reverted. Reported-by: Ryan Bird <ryanzxg@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org Backpatch-through: 13
2025-11-03Add wal_fpi_bytes to VACUUM and ANALYZE logsMichael Paquier
The new wal_fpi_bytes counter calculates the total amount of full page images inserted in WAL records, in bytes. This commit adds this information to VACUUM and ANALYZE logs alongside the existing counters, building upon f9a09aa29520. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aQMMSSlFXy4Evxn3@paquier.xyz
2025-11-02Document nbtree row comparison design.Peter Geoghegan
Add comments explaining when and where it is safe for nbtree to treat row compare keys as if they were simple scalar inequality keys on the row's most significant column. This is particularly important within _bt_advance_array_keys, which deals with required inequality keys in a general and uniform way, without any special handling for row compares. Also spell out the implications of _bt_check_rowcompare's approach of _conditionally_ evaluating lower-order row compare subkeys, particularly when one of its lower-order subkeys might see NULL index tuple values (these may or may not affect whether the qual as a whole is satisfied). The behavior in this area isn't particularly intuitive, so these issues seem worth going into. In passing, add a few more defensive/documenting row comparison related assertions to _bt_first and _bt_check_rowcompare. Follow-up to commits bd3f59fd and ec986020. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Reviewed-By: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAH2-Wznwkak_K7pcAdv9uH8ZfNo8QO7+tHXOaCUddMeTfaCCFw@mail.gmail.com Backpatch-through: 18
2025-11-02Remove obsolete nbtree equality key comments.Peter Geoghegan
_bt_first reliably uses the same equality key (on each index column) for initial positioning purposes as the one that _bt_checkkeys can use to end the scan following commit f09816a0. _bt_first no longer applies its own independent rules to determine which initial positioning key to use on each column (for equality and inequality keys alike). Preprocessing is now fully in control of determining which keys start and end each scan, ensuring that _bt_first and _bt_checkkeys have symmetric behavior. Remove obsolete comments that described why _bt_first was expected to use at least one of the available required equality keys for initial positioning purposes. The rules in this area are now maximally strict and uniform, so there's no reason to draw attention to equality keys. Any column with a required equality key cannot have a redundant required inequality key (nor can it have a redundant required equality key). Oversight in commit f09816a0, which removed similar comments from _bt_first, but missed these comments. Author: Peter Geoghegan <pg@bowt.ie> Backpatch-through: 18
2025-10-31Mark function arguments of type "Datum *" as "const Datum *" where possiblePeter Eisentraut
Several functions in the codebase accept "Datum *" parameters but do not modify the pointed-to data. These have been updated to take "const Datum *" instead, improving type safety and making the interfaces clearer about their intent. This change helps the compiler catch accidental modifications and better documents immutability of arguments. Most of "Datum *" parameters have a pairing "bool *isnull" parameter, they are constified as well. No functional behavior is changed by this patch. Author: Chao Li <lic@highgo.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
2025-10-30Mark ItemPointer arguments as const throughoutPeter Eisentraut
This is a follow up 991295f. I searched over src/ and made all ItemPointer arguments as const as much as possible. Note: We cut out from the original patch the pieces that would have created incompatibilities in the index or table AM APIs. Those could be considered separately. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
2025-10-30Fix some confusing uses of constPeter Eisentraut
There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData *Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData *x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
2025-10-29Fix correctness issue with computation of FPI size in WAL statsMichael Paquier
XLogRecordAssemble() may be called multiple times before inserting a record in XLogInsertRecord(), and the amount of FPIs generated inside a record whose insertion is attempted multiple times may vary. The logic added in f9a09aa29520 touched directly pgWalUsage in XLogRecordAssemble(), meaning that it could be possible for pgWalUsage to be incremented multiple times for a single record. This commit changes the code to use the same logic as the number of FPIs added to a record, where XLogRecordAssemble() returns this information and feeds it to XLogInsertRecord(), updating pgWalUsage only when a record is inserted. Reported-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com
2025-10-28Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal()Michael Paquier
This new counter, called "wal_fpi_bytes", tracks the total amount in bytes of full page images (FPIs) generated in WAL. This data becomes available globally via pg_stat_wal, and for backend statistics via pg_stat_get_backend_wal(). Previously, this information could only be retrieved with pg_waldump or pg_walinspect, which may not be available depending on the environment, and are expensive to execute. It offers hints about how much FPIs impact the WAL generated, which could be a large percentage for some workloads, as well as the effects of wal_compression or page holes. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in PgStat_WalCounters. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
2025-10-27Add some const qualificationsPeter Eisentraut
Add some const qualifications afforded by the previous change that added a const qualification to PageAddItemExtended(). Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
2025-10-27Remove Item typePeter Eisentraut
This type is just char * underneath, it provides no real value, no type safety, and just makes the code one level more mysterious. It is more idiomatic to refer to blobs of memory by a combination of void * and size_t, so change it to that. Also, since this type hides the pointerness, we can't apply qualifiers to what is pointed to, which requires some unconstify nonsense. This change allows fixing that. Extension code that uses the Item type can change its code to use void * to be backward compatible. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
2025-10-22Make invalid primary_slot_name follow standard GUC error reporting.Fujii Masao
Previously, if primary_slot_name was set to an invalid slot name and the configuration file was reloaded, both the postmaster and all other backend processes reported a WARNING. With many processes running, this could produce a flood of duplicate messages. The problem was that the GUC check hook for primary_slot_name reported errors at WARNING level via ereport(). This commit changes the check hook to use GUC_check_errdetail() and GUC_check_errhint() for error reporting. As with other GUC parameters, this causes non-postmaster processes to log the message at DEBUG3, so by default, only the postmaster's message appears in the log file. Backpatch to all supported versions. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com Backpatch-through: 13
2025-10-22Use CompactAttribute more often, when possibleDavid Rowley
5983a4cff added CompactAttribute for storing commonly used fields from FormData_pg_attribute. 5983a4cff didn't go to the trouble of adjusting every location where we can use CompactAttribute rather than FormData_pg_attribute, so here we change the remaining ones. There are some locations where I've left the code using FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using CompactAttribute here seems more risky as often the TupleDesc is being changed and those changes may not have been flushed to the CompactAttribute yet. I've also left record_recv(), record_send(), record_cmp(), record_eq() and record_image_eq() alone as it's not clear to me that accessing the CompactAttribute is a win here due to the FormData_pg_attribute still having to be accessed for most cases. Switching the relevant parts to use CompactAttribute would result in having to access both for common cases. Careful benchmarking may reveal that something can be done to make this better, but in absence of that, the safer option is to leave these alone. In ReorderBufferToastReplace(), there was a check to skip attnums < 0 while looping over the TupleDesc. Doing this is redundant since TupleDescs don't store < 0 attnums. Removing that code allows us to move to using CompactAttribute. The change in validateDomainCheckConstraint() just moves fetching the FormData_pg_attribute into the ERROR path, which is cold due to calling errstart_cold() and results in code being moved out of the common path. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
2025-10-21Re-pgindent brin.c.Nathan Bossart
Backpatch-through: 13
2025-10-21Fix BRIN 32-bit counter wrap issue with huge tablesDavid Rowley
A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange to when the table contains close to 2^32 pages. At worst, this could result in a cancellable infinite loop during the BRIN index scan with power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and scanning of unneeded heap blocks for non power-of-2 pagesPerRange. Backpatch to all supported versions. Author: sunil s <sunilfeb26@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com Backpatch-through: 13
2025-10-15Add log_autoanalyze_min_durationPeter Eisentraut
The log output functionality of log_autovacuum_min_duration applies to both VACUUM and ANALYZE, so it is not possible to separate the VACUUM and ANALYZE log output thresholds. Logs are likely to be output only for VACUUM and not for ANALYZE. Therefore, we decided to separate the threshold for log output of VACUUM by autovacuum (log_autovacuum_min_duration) and the threshold for log output of ANALYZE by autovacuum (log_autoanalyze_min_duration). Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com> Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
2025-10-14Make heap_page_is_all_visible independent of LVRelStateMelanie Plageman
This function only requires a few fields from LVRelState, so pass them in individually. This change allows calling heap_page_is_all_visible() from code such as pruneheap.c, which does not have access to an LVRelState. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
2025-10-14Inline TransactionIdFollows/Precedes[OrEquals]()Melanie Plageman
These functions appeared prominently in a profile of a patch that sets the visibility map on-access. Inline them to remove call overhead and make them cheaper to use in hot paths. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
2025-10-14Add helper for freeze determination to heap_page_prune_and_freezeMelanie Plageman
After scanning the line pointers on a heap page during the first phase of vacuum, we use the information collected to decide whether to use the assembled freeze plans. Move this decision logic into a helper function to improve readability. While here, rename a PruneState member and disambiguate some local variables in heap_page_prune_and_freeze(). Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
2025-10-13Eliminate XLOG_HEAP2_VISIBLE from vacuum phase IIIMelanie Plageman
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each page that becomes all-visible in vacuum's third phase, specify the VM changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record. Visibility checks are now performed before marking dead items unused. This is safe because the heap page is held under exclusive lock for the entire operation. This reduces the number of WAL records generated by VACUUM phase III by up to 50%. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2025-10-12Remove unused nbtree array advancement variable.Peter Geoghegan
Remove a variable that is no longer in use following commit 9a2e2a28. It's not immediately clear why there were no compiler warnings about this oversight. Author: Peter Geoghegan <pg@bowt.ie> Backpatch-through: 18
2025-10-10Remove overzealous _bt_killitems assertion.Peter Geoghegan
An assertion in _bt_killitems expected the scan's currPos state to contain a valid LSN, saved from when currPos's page was initially read. The assertion failed to account for the fact that even logged relations can have leaf pages with an invalid LSN when built with wal_level set to "minimal". Remove the faulty assertion. Oversight in commit e6eed40e (though note that the assertion was backpatched to stable branches before 18 by commit 7c319f54). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl> Bug: #19082 Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org Backpatch-through: 13
2025-10-10Fix two typos in xlogstats.h and xlogstats.cMichael Paquier
Issue found while browsing this area of the code, introduced and copy-pasted around by 2258e76f90bf. Backpatch-through: 15
2025-10-09Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLEMelanie Plageman
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting bits in the VM, specify the VM block changes in the XLOG_HEAP2_MULTI_INSERT record. This halves the number of WAL records emitted by COPY FREEZE. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
2025-10-08bufmgr: fewer calls to BufferDescriptorGetContentLockAndres Freund
We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff