summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2024-01-13Make attstattarget nullablePeter Eisentraut
This changes the pg_attribute field attstattarget into a nullable field in the variable-length part of the row. If no value is set by the user for attstattarget, it is now null instead of previously -1. This saves space in pg_attribute and tuple descriptors for most practical scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108 to 104.) Also, null is the semantically more correct value. The ANALYZE code internally continues to represent the default statistics target by -1, so that that code can avoid having to deal with null values. But that is now contained to the ANALYZE code. Only the DDL code deals with attstattarget possibly null. For system columns, the field is now always null. The ANALYZE code skips system columns anyway. To set a column's statistics target to the default value, the new command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET STATISTICS -1 still works.) Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-01-12Refactor code checking for file existenceMichael Paquier
jit.c and dfgr.c had a copy of the same code to check if a file exists or not, with a twist: jit.c did not check for EACCES when failing the stat() call for the path whose existence is tested. This refactored routine will be used by an upcoming patch. Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-11Add new function pg_get_wal_summarizer_state().Robert Haas
This makes it possible to access information about the progress of WAL summarization from SQL. The previously-added functions pg_available_wal_summaries() and pg_wal_summary_contents() only examine on-disk state, but this function exposes information from the server's shared memory. Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
2024-01-09Cross-check lists of predefined LWLocks.Nathan Bossart
Both lwlocknames.txt and wait_event_names.txt contain a list of all the predefined LWLocks, i.e., those with predefined positions within MainLWLockArray. It is easy to miss one or the other, especially since the list in wait_event_names.txt omits the "Lock" suffix from all the LWLock wait events. This commit adds a cross- check of these lists to the script that generates lwlocknames.h. If the lists do not match exactly, building will fail. Suggested-by: Robert Haas Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13
2024-01-09Fix misuse of RelOptInfo.unique_for_rels cache by SJEAlexander Korotkov
When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to innerrel_is_unique_ext(). That might lead to an invalid match to cache entries made by previous non self-join checking calls. Add UniqueRelInfo.self_join flag to prevent such cases. Also, fix that SJE should require a strict match of outerrelids to make sure UniqueRelInfo.extra_clauses are valid. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com
2024-01-08Make dblink interruptible, via new libpqsrv APIs.Noah Misch
This replaces dblink's blocking libpq calls, allowing cancellation and allowing DROP DATABASE (of a database not involved in the query). Apart from explicit dblink_cancel_query() calls, dblink still doesn't cancel the remote side. The replacement for the blocking calls consists of new, general-purpose query execution wrappers in the libpqsrv facility. Out-of-tree extensions should adopt these. Use them in postgres_fdw, replacing a local implementation from which the libpqsrv implementation derives. This is a bug fix for dblink. Code inspection identified the bug at least thirteen years ago, but user complaints have not appeared. Hence, no back-patch for now. Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
2024-01-08Remove excess #include "utils/wait_event.h".Noah Misch
This simplifies copying libpq/libpq-be-fe-helpers.h into extensions, because some supported PostgreSQL versions lack the other header. Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
2024-01-08Fix missing word in comment.Noah Misch
2024-01-08Allow examine_simple_variable() to work on INSERT RETURNING Vars.Tom Lane
Since commit 599b33b94, this function assumed that every RTE_RELATION RangeTblEntry would have an associated RelOptInfo. But that's not so: we only build RelOptInfos for relations that are scanned by the query. In particular the target of an INSERT won't have one, so that Vars appearing in an INSERT ... RETURNING list will not have an associated RelOptInfo. This apparently wasn't a problem before commit f7816aec2 taught examine_simple_variable() to drill down into CTEs containing INSERT RETURNING, but it is now. To fix, add a fallback code path that gets the userid to use directly from the RTEPermissionInfo associated with the RTE. (Sadly, we must have two code paths, because not every RTE has a RTEPermissionInfo either.) Per report from Alexander Lakhin. No back-patch, since the case is apparently unreachable before f7816aec2. Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com
2024-01-04Teach estimate_array_length() to use statistics where available.Tom Lane
If we have DECHIST statistics about the argument expression, use the average number of distinct elements as the array length estimate. (It'd be better to use the average total number of elements, but that is not currently calculated by compute_array_stats(), and it's unclear that it'd be worth extra effort to get.) To do this, we have to change the signature of estimate_array_length to pass the "root" pointer. While at it, also change its result type to "double". That's probably not really necessary, but it avoids any risk of overflow of the value extracted from DECHIST. All existing callers are going to use the result in a "double" calculation anyway. Paul Jungwirth, reviewed by Jian He and myself Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com
2024-01-04Add macros for looping through a List without a ListCell.Nathan Bossart
Many foreach loops only use the ListCell pointer to retrieve the content of the cell, like so: ListCell *lc; foreach(lc, mylist) { int myint = lfirst_int(lc); ... } This commit adds a few convenience macros that automatically declare the loop variable and retrieve the current cell's contents. This allows us to rewrite the previous loop like this: foreach_int(myint, mylist) { ... } This commit also adjusts a few existing loops in order to add coverage for the new/adjusted macros. There is presently no plan to bulk update all foreach loops, as that could introduce a significant amount of back-patching pain. Instead, these macros are primarily intended for use in new code. Author: Jelte Fennema-Nio Reviewed-by: David Rowley, Alvaro Herrera, Vignesh C, Tom Lane Discussion: https://postgr.es/m/CAGECzQSwXKnxGwW1_Q5JE%2B8Ja20kyAbhBHO04vVrQsLcDciwXA%40mail.gmail.com
2024-01-04ALTER TABLE command to change generation expressionPeter Eisentraut
This adds a new ALTER TABLE subcommand ALTER COLUMN ... SET EXPRESSION that changes the generation expression of a generated column. The syntax is not standard but was adapted from other SQL implementations. This command causes a table rewrite, using the usual ALTER TABLE mechanisms. The implementation is similar to and makes use of some of the infrastructure of the SET DATA TYPE subcommand (for example, rebuilding constraints and indexes afterwards). The new command requires a new pass in AlterTablePass, and the ADD COLUMN pass had to be moved earlier so that combinations of ADD COLUMN and SET EXPRESSION can work. Author: Amul Sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com
2024-01-04Track conflict_reason in pg_replication_slots.Amit Kapila
This patch changes the existing 'conflicting' field to 'conflict_reason' in pg_replication_slots. This new field indicates the reason for the logical slot's conflict with recovery. It is always NULL for physical slots, as well as for logical slots which are not invalidated. The non-NULL values indicate that the slot is marked as invalidated. Possible values are: wal_removed = required WAL has been removed. rows_removed = required rows have been removed. wal_level_insufficient = the primary doesn't have a wal_level sufficient to perform logical decoding. The existing users of 'conflicting' column can get the same answer by using 'conflict_reason' IS NOT NULL. Author: Shveta Malik Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2024-01-03Second attempt at organizing jsonpath operators and methodsPeter Eisentraut
Second attempt at 283a95da923. Since we can't reorder the enum values of JsonPathItemType, instead reorder the switch cases where they are used to generally follow the order of the enum values, for better maintainability.
2024-01-03Revert "Reorganise jsonpath operators and methods"Peter Eisentraut
This reverts commit 283a95da923605c1cc148155db2d865d0801b419. The reordering of JsonPathItemType affects the binary on-disk compatibility of the jsonpath type, so we must not change it. Revert for now and consider.
2024-01-03Reorganise jsonpath operators and methodsPeter Eisentraut
Various jsonpath operators and methods add various keywords, switch cases, and documentation entries in some order. However, they are not consistent; reorder them for better maintainability or readability. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
2024-01-03Add numeric_int8_opt_error() to optionally suppress errorsPeter Eisentraut
This matches the existing numeric_int4_opt_error() (see commit 16d489b0fe). It will be used by a future JSON-related patch, which wants to report errors in its own way and thus does not want the internal functions to throw any error. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
2024-01-02gist: fix typo "split(t)ed" -> "split"Robert Haas
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02Fix typos in comments and in one isolation test.Robert Haas
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions by me. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02Fix typos in simplehash.hPeter Eisentraut
Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/18252-d46d27900a277d87@postgresql.org
2024-01-02Allow upgrades to preserve the full subscription's state.Amit Kapila
This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2023-12-30Minor cleanup of the BRIN parallel build codeTomas Vondra
Commit b437571714 added support for parallel builds for BRIN indexes, using code similar to BTREE parallel builds, and also a new tuplesort variant. This commit simplifies the new code in two ways: * The "spool" grouping tuplesort and the heap/index is not necessary. The heap/index are available as separate arguments, causing confusion. So remove the spool, and use the tuplesort directly. * The new tuplesort variant does not need the heap/index, as it sorts simply by the range block number, without accessing the tuple data. So simplify that too. Initial report and patch by Ranier Vilela, further cleanup by me. Author: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
2023-12-30Add GUC backtrace_on_internal_errorPeter Eisentraut
When enabled (default off), this logs a backtrace anytime elog() or an equivalent ereport() for internal errors is called. This is not well covered by the existing backtrace_functions, because there are many equally-worded low-level errors in many functions. And if you find out where the error is, then you need to manually rewrite the elog() to ereport() to attach the errbacktrace(), which is annoying. Having a backtrace automatically on every elog() call could be very helpful during development for various kinds of common errors from palloc, syscache, node support, etc. Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org
2023-12-29Make all Perl warnings fatalPeter Eisentraut
There are a lot of Perl scripts in the tree, mostly code generation and TAP tests. Occasionally, these scripts produce warnings. These are probably always mistakes on the developer side (true positives). Typical examples are warnings from genbki.pl or related when you make a mess in the catalog files during development, or warnings from tests when they massage a config file that looks different on different hosts, or mistakes during merges (e.g., duplicate subroutine definitions), or just mistakes that weren't noticed because there is a lot of output in a verbose build. This changes all warnings into fatal errors, by replacing use warnings; by use warnings FATAL => 'all'; in all Perl files. Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-27Improve the implementation of information_schema._pg_expandarray().Tom Lane
This function was originally coded with a handmade expansion of the array subscripts. We can do it a little faster and far more legibly today, by using unnest() WITH ORDINALITY. While at it, let's apply the rowcount estimation support that exists for the underlying unnest() function: reduce the default ROWS estimate to 100 and attach array_unnest_support. I'm not sure that array_unnest_support can do anything useful today with the call sites that exist in information_schema, but it can't hurt, and the existing default rowcount of 1000 is surely much too high for any of these cases. The psql.sql regression script is using _pg_expandarray() as a test case for \sf+. While we could keep doing so, the new one-line function body makes a poor test case for \sf+ row-numbering, so switch it to print another information_schema function. Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us
2023-12-27Improvements and fixes for e0b1ee17dcAlexander Korotkov
e0b1ee17dc introduced optimization for matching B-tree scan keys required for the directional scan. However, it incorrectly assumed that all keys required for opposite direction scan are satisfied by _bt_first(). It has been illustrated that with multiple scan keys over the same column, a lesser one (according to the scan direction) could win leaving the other one unsatisfied. Instead of relying on _bt_first() this commit introduces code that memorizes whether there was at least one match on the page. If that's true we know that keys required for opposite-direction scan are satisfied as soon as corresponding values are not NULLs. Also, this commit simplifies the description for the optimization of keys required for the current direction scan. Now the flag used for this is named continuescanPrechecked and means exactly that *continuescan flag is known to be true for the last item on the page. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com Reviewed-by: Pavel Borisov
2023-12-27Remove BTScanOpaqueData.firstPageAlexander Korotkov
It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData. This commit makes it an argument of the _bt_readpage() function. We can easily distinguish first-time and repeated calls (within the scan) of this function. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com Reviewed-by: Pavel Borisov
2023-12-27REALLOCATE_BITMAPSETS manual compile-time optionAlexander Korotkov
This option forces each bitmapset modification to reallocate bitmapset. This is useful for debugging hangling pointers to bitmapset's. Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
2023-12-25Enhance checkpointer restartpoint statisticsAlexander Korotkov
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
2023-12-21Fix numerous typos in incremental backup commits.Robert Haas
Apparently, spell check would have been a really good idea. Alexander Lakhin, with a few additions as per an off-list report from Andres Freund. Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com
2023-12-20Add support for incremental backup.Robert Haas
To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20Add a new WAL summarizer process.Robert Haas
When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-19Move src/bin/pg_verifybackup/parse_manifest.c into src/common.Robert Haas
This makes it possible for the code to be easily reused by other client-side tools, and/or by the server. Patch by me. Review of this patch in particular by at least Peter Eisentraut; reviewers for the patch series in general include Dilip Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
2023-12-19Prevent integer overflow when forming tuple width estimates.Tom Lane
It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
2023-12-19Update comment for Cardinality typedefPeter Eisentraut
Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-Zd7Yy80RL1NdskLLo-wz6QoqsbC5TKs%3D3yZxG3BT_aA%40mail.gmail.com
2023-12-19Simplify newNode() by removing special casesHeikki Linnakangas
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
2023-12-18compute_bitmap_pages' loop_count parameter should be double not int.Tom Lane
The value was double in the original implementation of this logic. Commit da08a6598 pulled it out into a subroutine, but carelessly declared the parameter as int when it should have been double. On most platforms, the only ill effect would be to clamp the value to be not more than INT_MAX, which would seldom be exceeded and probably wouldn't change the estimates too much anyway. Nonetheless, it's wrong and can cause complaints from ubsan. While here, improve the comments and parameter names. This is an ABI change in a globally exposed subroutine, so back-patching would create some risk of breaking extensions. The value of the fix doesn't seem high enough to warrant taking that risk, so fix in HEAD only. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
2023-12-18Optimize pg_atomic_exchange_u32 and pg_atomic_exchange_u64.Nathan Bossart
Presently, all platforms implement atomic exchanges by performing an atomic compare-and-swap in a loop until it succeeds. This can be especially expensive when there is contention on the atomic variable. This commit optimizes atomic exchanges on many platforms by using compiler intrinsics, which should compile into something much less expensive than a compare-and-swap loop. Since these intrinsics have been available for some time, the inline assembly implementations are omitted. Suggested-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13
2023-12-18Provide vectored variants of smgrread() and smgrwrite().Thomas Munro
smgrreadv() and smgrwritev() and their md.c implementations call FileReadV() and FileWriteV(). A range of disk blocks beginning at 'blocknum' and extending for 'nblocks' can be scattered to or gathered from multiple buffers with a single system call. The traditional smgrread() and smgrwrite() functions are implemented in terms of the new functions. Later commits will introduce calls with nblocks > 1, but the following behavioral changes can be seen already: * After a short transfer we'll now retry until we eventually read 0 bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we would infer the reason. Retrying is consistent with xlog.c's treatment of large WAL writes, and arguably also xlog.c and fd.c's treatment of EINTR. Arbitrary short returns for larger transfers have been observed on several OSes, and might in theory also happen for transient reasons with our own pg_p*v() fallback code. * After unexpected EOF or -1, the error thrown now talks about a range even for the single block case, eg "blocks 42..42". Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-16Refactor pgstat_prepare_io_time() with an input argument instead of a GUCMichael Paquier
Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
2023-12-16Remove useless LIMIT_OPTION_DEFAULT value from LimitOptionAlvaro Herrera
During the development that led to commit 357889eb17bb, for a time we had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely removed later on, before commit. Complete the removal now. Author: Zhang Mingli <avamingli@gmail.com> Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
2023-12-16Provide multi-block smgrprefetch().Thomas Munro
Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a single block at a time. Add an nblocks argument so that we can do the same for a range of blocks. This usually produces a single system call, but might need to loop if it crosses a segment boundary. Initially it is only called with nblocks == 1, but proposed patches will make wider calls. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version) Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-15Fix bugs in manipulation of large objects.Tom Lane
In v16 and up (since commit afbfc0298), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
2023-12-12Provide vectored variants of FileRead() and FileWrite().Thomas Munro
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for fd.c's virtual file descriptors. The simple FileRead() and FileWrite() functions are now implemented in terms of the vectored functions, to avoid code duplication, and they are converted back to the corresponding simple system calls further down (commit 15c9ac36). Later work will make more interesting multi-iovec calls. The traditional behavior of reporting a "fake" ENOSPC error is simplified. It's now always set for non-failing writes, for the benefit of callers that expect to log a meaningful "%m" if they determine that the write was short. (Perhaps we should consider getting rid of that expectation one day.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12Provide helper for retrying partial vectored I/O.Thomas Munro
compute_remaining_iovec() is a re-usable routine for retrying after pg_readv() or pg_writev() reports a short transfer. This will gain new users in a later commit, but can already replace the open-coded equivalent code in the existing pg_pwritev_with_retry() function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12Define unconstify() and unvolatize() for C++.Thomas Munro
These two macros wouldn't work if used in an inline function definition in a header seen by g++, because __builtin_types_compatible_p is only available in C. Redirect to standard C++ const_cast (which also adds/removes volatile despite its name). Per cpluspluscheck failure in a development branch. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com
2023-12-11Simplify productions for FORMAT JSON [ ENCODING name ]Alvaro Herrera
This removes the production json_encoding_clause_opt, instead merging it into json_format_clause. Also remove the auxiliary makeJsonEncoding() function. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
2023-12-11Remove trace_recovery_messagesMichael Paquier
This GUC was intended as a debugging help in the 9.0 area when hot standby and streaming replication were being developped, able to offer more information at LOG level rather than DEBUGn. There are more tools available these days that are able to offer rather equivalent information, like pg_waldump introduced in 9.3. It is not obvious how this facility is useful these days, so let's remove it. Author: Bharath Rupireddy Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
2023-12-10Remove some unnecessary includes of "access/xlog_internal.h"Peter Eisentraut
There were a few places where access/xlog_internal.h was apparently included unnecessarily. In some of those places, a more specific header file (that somehow came in via access/xlog_internal.h) can be used instead. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/a56a6eec-eb14-471b-9570-3cac23603964%40eisentraut.org