summaryrefslogtreecommitdiff
path: root/src/backend/access/heap
AgeCommit message (Collapse)Author
2024-04-08Use streaming I/O in ANALYZE.Thomas Munro
The ANALYZE command prefetches and reads sample blocks chosen by a BlockSampler algorithm. Instead of calling [Prefetch|Read]Buffer() for each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
2024-04-08Use streaming I/O in sequential scans.Thomas Munro
Instead of calling ReadBuffer() for each block, heap sequential scans and TID range scans now use the streaming API introduced in b5a9b18cd0. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com
2024-04-06Reduce branches in heapgetpage()'s per-tuple loopAndres Freund
Until now, heapgetpage()'s loop over all tuples performed some conditional checks for each tuple, even though condition did not change across the loop. This commit fixes that by moving the loop into an inline function. By calling it with different constant arguments, the compiler can generate an optimized loop for the different conditions, at the price of two per-page checks. For cases of all-visible tables and an isolation level other than serializable, speedups of up to 25% have been measured. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Tested-by: Quan Zongliang <quanzongliang@yeah.net> Discussion: https://postgr.es/m/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de Discussion: https://postgr.es/m/2ef7ff1b-3d18-2283-61b1-bbd25fc6c7ce@yeah.net
2024-04-06Optimize visibilitymap_count() with AVX-512 instructions.Nathan Bossart
Commit 792752af4e added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
2024-04-07BitmapHeapScan: Push skip_fetch optimization into table AMTomas Vondra
Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-04Secondary refactor of heap scanning functionsDavid Rowley
Similar to 44086b097, refactor heap scanning functions to be more suitable for the read stream API. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
2024-04-04Preliminary refactor of heap scanning functionsDavid Rowley
To allow the use of the read stream API added in b5a9b18cd for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
2024-04-03Combine freezing and pruning steps in VACUUMHeikki Linnakangas
Execute both freezing and pruning of tuples in the same heap_page_prune() function, now called heap_page_prune_and_freeze(), and emit a single WAL record containing all changes. That reduces the overall amount of WAL generated. This moves the freezing logic from vacuumlazy.c to the heap_page_prune_and_freeze() function. The main difference in the coding is that in vacuumlazy.c, we looked at the tuples after the pruning had already happened, but in heap_page_prune_and_freeze() we operate on the tuples before pruning. The heap_prepare_freeze_tuple() function is now invoked after we have determined that a tuple is not going to be pruned away. VACUUM no longer needs to loop through the items on the page after pruning. heap_page_prune_and_freeze() does all the work. It now returns the list of dead offsets, including existing LP_DEAD items, to the caller. Similarly it's now responsible for tracking 'all_visible', 'all_frozen', and 'hastup' on the caller's behalf. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-03Refactor how heap_prune_chain() updates prunable_xidHeikki Linnakangas
In preparation of freezing and counting tuples which are not candidates for pruning, split heap_prune_record_unchanged() into multiple functions, depending the kind of line pointer. That's not too interesting right now, but makes the next commit smaller. Recording the lowest soon-to-be prunable xid is one of the actions we take for unchanged LP_NORMAL item pointers but not for others, so move that to the new heap_prune_record_unchanged_lp_normal() function. The next commit will add more actions to these functions. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-02Revert "Custom reloptions for table AM"Alexander Korotkov
This reverts commit c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues spotted after commit. Reported-by: Jeff Davis Discussion: https://postgr.es/m/11550b536211d5748bb2865ed6cb3502ff073bf7.camel%40j-davis.com
2024-04-02Use TidStore for dead tuple TIDs storage during lazy vacuum.Masahiko Sawada
Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit 30e144287a. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-04-02Introduce 'options' argument to heap_page_prune()Heikki Linnakangas
Currently there is only one option, HEAP_PAGE_PRUNE_MARK_UNUSED_NOW which replaces the old boolean argument, but upcoming patches will introduce at least one more. Having a lot of boolean arguments makes it hard to see at the call sites what the arguments mean, so prefer a bitmask of options with human-readable names. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240401172219.fngjosaqdgqqvg4e@liskov
2024-04-01Handle non-chain tuples outside of heap_prune_chain()Heikki Linnakangas
Handle dead branches of aborted HOT chains outside heap_prune_chain() as a separate phase. This simplifies the logic in heap_prune_chain(), as well as allowing us to clean up more RECENTLY_DEAD -> DEAD chains. To accomplish this efficiently, partition tuples into HOT and non-HOT while first collecting visibility information for each tuple in heap_page_prune(). Then call heap_prune_chain() only on potential chain members. Then mop up the leftover HOT tuples afterwards. As part of this, keep track of which items on page have already been processed, in 'processed' array. This replaces the 'marked' array which was only set for tuples marked for removal or redirection. The 'processed' array is updated also for items that are left unchanged, when we conclude that an item can be left unchanged. At the end of pruning, every item on the page should be marked as processed in the array; an assertion is added for that. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-01Refactor heap_prune_chain()Heikki Linnakangas
Keep track of the number of deleted tuples in PruneState and record this information when recording a tuple dead, unused or redirected. This removes a special case from the traversal and chain processing logic as well as setting a precedent of recording the impact of prune actions in the record functions themselves. This paradigm will be used in future commits which move tracking of additional statistics on pruning actions from lazy_scan_prune() to heap_prune_chain(). Simplify heap_prune_chain()'s chain traversal logic by handling each case explicitly. That is, do not attempt to share code when processing different types of chains. For each category of chain, process it specifically and procedurally: first handling the root, then any intervening tuples, and, finally, the end of the chain. While we are at it, add a few new comments to heap_prune_chain() clarifying some special cases involving RECENTLY_DEAD tuples. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-01Minor refactoring in heap_page_pruneHeikki Linnakangas
Pass 'page', 'blockno' and 'maxoff' to heap_prune_chain() as arguments, so that it doesn't need to fetch them from the buffer. This saves a few cycles per chain. Remove the "if (off_loc != NULL)" checks, and require the caller to pass a non-NULL 'off_loc'. Pass a pointer to a dummy local variable when it's not needed. Those checks are cheap, but it's still better to avoid them in the per-chain loops when we can do so easily. The CPU time saving from these changes are hardly measurable, but fewer instructions is good anyway, so why not. I spotted the potential for these while reviewing Melanie Plageman's patch set to combine prune and freeze records. Discussion: https://www.postgresql.org/message-id/CAAKRu_abm2tHhrc0QSQa%3D%3DsHe%3DVA1%3Doz1dJMQYUOKuHmu%2B9Xrg%40mail.gmail.com
2024-03-30Let table AM insertion methods control index insertionAlexander Korotkov
Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
2024-03-30Custom reloptions for table AMAlexander Korotkov
Let table AM define custom reloptions for its tables. This allows to specify AM-specific parameters by WITH clause when creating a table. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent
2024-03-30Generalize relation analyze in table AM interfaceAlexander Korotkov
Currently, there is just one algorithm for sampling tuples from a table written in acquire_sample_rows(). Custom table AM can just redefine the way to get the next block/tuple by implementing scan_analyze_next_block() and scan_analyze_next_tuple() API functions. This approach doesn't seem general enough. For instance, it's unclear how to sample this way index-organized tables. This commit allows table AM to encapsulate the whole sampling algorithm (currently implemented in acquire_sample_rows()) into the relation_analyze() API function. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent
2024-03-28Remove obsolete comment about VACUUM retrying pruningHeikki Linnakangas
Commit 1ccc1e05ae removed the retry logic that the comment talked about. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov
2024-03-27Fix some typos and grammar issues from commit 87985cc92522Alexander Korotkov
Reported-by: Alexander Lakhin
2024-03-26Allow locking updated tuples in tuple_update() and tuple_delete()Alexander Korotkov
Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by the concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This commit eliminates step 2 by taking the lock during the first tuple_update()/tuple_delete() call. The heap table access method saves some effort by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. Also, this commit makes tuple_update()/tuple_delete() optionally save the old tuple into the dedicated slot. That saves efforts on re-fetching tuples in certain cases. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers
2024-03-25Merge prune, freeze and vacuum WAL record formatsHeikki Linnakangas
The new combined WAL record is now used for pruning, freezing and 2nd pass of vacuum. This is in preparation for changing VACUUM to write a combined prune+freeze record per page, instead of separate two records. The new WAL record format now supports that, but the code still always writes separate records for pruning and freezing. This reserves separate XLOG_HEAP2_* info codes for when the pruning record is emitted for on-access pruning or VACUUM, per Peter Geoghegan's suggestion. The record format is identical, but having separate info codes makes it easier analyze pruning and vacuuming with pg_waldump. The function to emit the new WAL record, log_heap_prune_and_freeze(), is in pruneheap.c. The existing heap_log_freeze_plan() and its subroutines are moved to pruneheap.c without changes, to keep them together with log_heap_prune_and_freeze(). Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
2024-03-21Allow table AM tuple_insert() method to return the different slotAlexander Korotkov
This allows table AM to return a native tuple slot even if VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge about system attributes, which could be accessed in the future. table_multi_insert() method already can modify the input 'slots' array. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21Allow table AM to store complex data structures in rd_amcacheAlexander Korotkov
The new table AM method free_rd_amcache is responsible for freeing all the memory related to rd_amcache and setting free_rd_amcache to NULL. If the new method is not specified, we still assume rd_amcache to be a single chunk of memory, which could be just pfree'd. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li
2024-03-20Remove unused PruneState member relHeikki Linnakangas
PruneState->rel is no longer being used, so just remove it. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
2024-03-20Reorganize heap_page_prune() function commentHeikki Linnakangas
heap_page_prune()'s function header comment didn't explain the parameters in the same order they appear in the function. Fix that. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
2024-03-13Make the order of the header file includes consistentPeter Eisentraut
Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
2024-03-11Remove unneeded vacuum_delay_point from heap_vac_scan_get_next_blockHeikki Linnakangas
heap_vac_scan_get_next_block() does relatively little work, so there is no need to call vacuum_delay_point(). A future commit will call heap_vac_scan_get_next_block() from a callback, and we would like to avoid calling vacuum_delay_point() in that callback. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
2024-03-11Confine vacuum skip logic to lazy_scan_skip()Heikki Linnakangas
Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more code into the function, so that the caller doesn't need to know about ranges or skipping anymore. heap_vac_scan_next_block() returns the next block to process, and the logic for determining that block is all within the function. This makes the skipping logic easier to understand, as it's all in the same function, and makes the calling code easier to understand as it's less cluttered. The state variables needed to manage the skipping logic are moved to LVRelState. heap_vac_scan_next_block() now manages its own VM buffer separately from the caller's vmbuffer variable. The caller's vmbuffer holds the VM page for the current block its processing, while heap_vac_scan_next_block() keeps a pin on the VM page for the next unskippable block. Most of the time they are the same, so we hold two pins on the same buffer, but it's more convenient to manage them separately. For readability inside heap_vac_scan_next_block(), move the logic of finding the next unskippable block to separate function, and add some comments. This refactoring will also help future patches to switch to using a streaming read interface, and eventually AIO (https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com) Author: Melanie Plageman, Heikki Linnakangas Reviewed-by: Andres Freund (older version) Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
2024-03-11Set all_visible_according_to_vm correctly with DISABLE_PAGE_SKIPPINGHeikki Linnakangas
It's important for 'all_visible_according_to_vm' to correctly reflect whether the VM bit is set or not, even when we are not trusting the VM to skip pages, because contrary to what the comment said, lazy_scan_prune() relies on it. If it's incorrectly set to 'false', when the VM bit is in fact set, lazy_scan_prune() will try to set the VM bit again and dirty the page unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all heap pages were dirtied, even if there were no changes. We would also fail to clear any VM bits that were set incorrectly. This was broken in commit 980ae17310, so backpatch to v16. Backpatch-through: 16 Reviewed-by: Melanie Plageman, Peter Geoghegan Discussion: https://www.postgresql.org/message-id/3df2b582-dc1c-46b6-99b6-38eddd1b2784@iki.fi
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04Use MyBackendType in more places to check what process this isHeikki Linnakangas
Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new Am*Process() macros that use MyBackendType. For consistency with the existing Am*Process() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-02-23Introduce a new smgr bulk loading facility.Heikki Linnakangas
The new facility makes it easier to optimize bulk loading, as the logic for buffering, WAL-logging, and syncing the relation only needs to be implemented once. It's also less error-prone: We have had a number of bugs in how a relation is fsync'd - or not - at the end of a bulk loading operation. By centralizing that logic to one place, we only need to write it correctly once. The new facility is faster for small relations: Instead of of calling smgrimmedsync(), we register the fsync to happen at next checkpoint, which avoids the fsync latency. That can make a big difference if you are e.g. restoring a schema-only dump with lots of relations. It is also slightly more efficient with large relations, as the WAL logging is performed multiple pages at a time. That avoids some WAL header overhead. The sorted GiST index build did that already, this moves the buffering to the new facility. The changes to pageinspect GiST test needs an explanation: Before this patch, the sorted GiST index build set the LSN on every page to the special GistBuildLSN value, not the LSN of the WAL record, even though they were WAL-logged. There was no particular need for it, it just happened naturally when we wrote out the pages before WAL-logging them. Now we WAL-log the pages first, like in B-tree build, so the pages are stamped with the record's real LSN. When the build is not WAL-logged, we still use GistBuildLSN. To make the test output predictable, use an unlogged index. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-12Remove unnecessary smgropen() callsHeikki Linnakangas
Now that RelationCreateStorage() returns the SmgrRelation (since commit 5c1560606dc), use that. Author: Japin Li Discussion: https://www.postgresql.org/message-id/ME3P282MB316600FA62F6605477F26F6AB6742@ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
2024-01-26Combine FSM updates for prune and no-prune cases.Robert Haas
lazy_scan_prune() and lazy_scan_noprune() update the freespace map with identical conditions; combine them. This consolidation is easier now that cb970240f13df2b63f0410f81f452179a2b78d6f moved visibility map updates into lazy_scan_prune(). While combining the FSM updates, simplify the logic for calling lazy_scan_new_or_empty() and lazy_scan_noprune(). Also update a few comemnts in this part of the code to make them, hopefully, clearer. Melanie Plageman and Robert Haas Discussion: https://postgr.es/m/CA%2BTgmoaLTvipm%3Dxx4rJLr07m908PCu%3DQH3uCjD1UOn8YaEuO2g%40mail.gmail.com
2024-01-18Remove LVPagePruneState.Robert Haas
Commit cb970240f13df2b63f0410f81f452179a2b78d6f moved some code from lazy_scan_heap() to lazy_scan_prune(), and now some things that used to need to be passed back and forth are completely local to lazy_scan_prune(). Hence, this struct is mostly obsolete. The only thing that still needs to be passed back to the caller is has_lpdead_items, and that's also passed back by lazy_scan_noprune(), so do it the same way in both cases. Melanie Plageman, reviewed and slightly revised by me. Discussion: http://postgr.es/m/CAAKRu_aM=OL85AOr-80wBsCr=vLVzhnaavqkVPRkFBtD0zsuLQ@mail.gmail.com
2024-01-18Move VM update code from lazy_scan_heap() to lazy_scan_prune().Robert Haas
Most of the output parameters of lazy_scan_prune() were being used to update the VM in lazy_scan_heap(). Moving that code into lazy_scan_prune() simplifies lazy_scan_heap() and requires less communication between the two. This change permits some further code simplification, but that is left for a separate commit. Melanie Plageman, reviewed by me. Discussion: http://postgr.es/m/CAAKRu_aM=OL85AOr-80wBsCr=vLVzhnaavqkVPRkFBtD0zsuLQ@mail.gmail.com
2024-01-18Optimize vacuuming of relations with no indexes.Robert Haas
If there are no indexes on a relation, items can be marked LP_UNUSED instead of LP_DEAD when pruning. This significantly reduces WAL volume, since we no longer need to emit one WAL record for pruning and a second to change the LP_DEAD line pointers thus created to LP_UNUSED. Melanie Plageman, reviewed by Andres Freund, Peter Geoghegan, and me Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
2024-01-16Be more consistent about whether to update the FSM while vacuuming.Robert Haas
Previously, when lazy_scan_noprune() was called and returned true, we would update the FSM immediately if the relation had no indexes or if the page contained no dead items. On the other hand, when lazy_scan_prune() was called, we would update the FSM if either of those things was true or if index vacuuming was disabled. Eliminate that behavioral difference by considering vacrel->do_index_vacuuming in both cases. Also, make lazy_scan_heap() responsible for deciding whether to update the FSM, instead of doing it inside lazy_scan_noprune(). This is more consistent with the lazy_scan_prune() case. lazy_scan_noprune() still needs an output parameter for whether there are LP_DEAD items on the page, but the real decision-making now happens in the caller. Patch by me, reviewed by Peter Geoghegan and Melanie Plageman. Discussion: http://postgr.es/m/CA+TgmoaOzvN1TcHd9iej=PR3fY40En1USxzOnXSR2CxCLaRM0g@mail.gmail.com
2024-01-11Remove hastup from LVPagePruneState.Robert Haas
Instead, just have lazy_scan_prune() and lazy_scan_noprune() update LVRelState->nonempty_pages directly. This makes the two functions more similar and also removes makes lazy_scan_noprune need one fewer output parameters. Melanie Plageman, reviewed by Andres Freund, Michael Paquier, and me Discussion: http://postgr.es/m/CAAKRu_btji_wQdg=ok-5E4v_bGVxKYnnFFe7RA6Frc1EcOwtSg@mail.gmail.com
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-11-30Apply quotes more consistently to GUC names in logsMichael Paquier
Quotes are applied to GUCs in a very inconsistent way across the code base, with a mix of double quotes or no quotes used. This commit removes double quotes around all the GUC names that are obviously referred to as parameters with non-English words (use of underscore, mixed case, etc). This is the result of a discussion with Álvaro Herrera, Nathan Bossart, Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
2023-11-28Fix assertions with RI triggers in heap_update and heap_delete.Heikki Linnakangas
If the tuple being updated is not visible to the crosscheck snapshot, we return TM_Updated but the assertions would not hold in that case. Move them to before the cross-check. Fixes bug #17893. Backpatch to all supported versions. Author: Alexander Lakhin Backpatch-through: 12 Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
2023-11-17Release lock on heap buffer before vacuuming FSMAndres Freund
When there are no indexes on a table, we vacuum each heap block after pruning it and then update the freespace map. Periodically, we also vacuum the freespace map. This was done while unnecessarily holding a lock on the heap page. Release the lock before calling FreeSpaceMapVacuumRange() and, while we're at it, ensure the range includes the heap block we just vacuumed. There are no known deadlocks or other similar issues, therefore don't backpatch. It's certainly not good to do all this work under a lock, but it's not frequently reached, making it not worth the risk of backpatching. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
2023-11-10Ensure we use the correct spelling of "ensure"David Rowley
We seem to have accidentally used "insure" in a few places. Correct that. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com Backpatch-through: 12, oldest supported version
2023-10-26Add trailing commas to enum definitionsPeter Eisentraut
Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-16Fix comment from commit 22655aa231.Thomas Munro
Per automated complaint from BF animal koel this needed to be re-indented, but there was also a typo. Back-patch to 16.
2023-10-13Fix bulk table extension when copying into multiple partitionsAndres Freund
When COPYing into a partitioned table that does now permit the use of table_multi_insert(), we could error out with ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes because BulkInsertState->next_free was not reset between partitions. This problem occurred only when not able to use table_multi_insert(), as a dedicated BulkInsertState for each partition is used in that case. The bug was introduced in 00d1e02be24, but it was hard to hit at that point, as commonly bulk relation extension is not used when not using table_multi_insert(). It became more likely after 82a4edabd27, which expanded the use of bulk extension. To fix the bug, reset the bulk relation extension state in BulkInsertState in ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fcf) to tackle a very similar issue. Obviously the name is not quite correct, but there might be external callers, and bulk insert state needs to be reset in precisely in the situations that ReleaseBulkInsertStatePin() already needed to be called. Medium term the better fix likely is to disallow reusing BulkInsertState across relations. Add a test that, without the fix, reproduces #18130 in most configurations. The test also catches the problem fixed in b1ecb9b3fcf when run with small shared_buffers. Reported-by: Ivan Kolombet <enderstd@gmail.com> Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us> Analyzed-by: Andres Freund <andres@anarazel.de> Bug: #18130 Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us Backpatch: 16-
2023-10-02Remove retry loop in heap_page_prune().Robert Haas
The retry loop is needed because heap_page_prune() calls HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same thing again, and they might get different answers due to concurrent clog updates. But this patch makes heap_page_prune() return the HeapTupleSatisfiesVacuum() results that it computed back to the caller, which allows lazy_scan_prune() to avoid needing to recompute those values in the first place. That's nice both because it eliminates the need for a retry loop and also because it's cheaper. Melanie Plageman, reviewed by David Geier, Andres Freund, and me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
2023-09-28Return data from heap_page_prune via a struct.Robert Haas
Previously, one of the values in the struct was returned as the return value, and another was returned via an output parameter. In preparation for returning more stuff, consolidate both values into a struct returned via an output parameter. Melanie Plageman, reviewed by Andres Freund and by me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com