summaryrefslogtreecommitdiff
path: root/src/backend/utils
AgeCommit message (Collapse)Author
2025-03-25Generalize index support in network support functionPeter Eisentraut
The network (inet) support functions currently only supported a hardcoded btree operator family. With the generalized compare type facility, we can generalize this to support any operator family from any index type that supports the required operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-24Add ExecCopySlotMinimalTupleExtra().Jeff Davis
Allows an "extra" argument that allocates extra memory at the end of the MinimalTuple. This is important for callers that need to store additional data, but do not want to perform an additional allocation. Suggested-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvppeqw2pNM-+ahBOJwq2QmC0hOAGsmCpC89QVmEoOvsdg@mail.gmail.com
2025-03-24Redefine max_files_per_process to control additionally opened filesAndres Freund
Until now max_files_per_process=N limited each backend to open N files in total (minus a safety factor), even if there were already more files opened in postmaster and inherited by backends. Change max_files_per_process to control how many additional files each process is allowed to open. The main motivation for this is the patch to add io_method=io_uring, which needs to open one file for each backend. Without this patch, even if RLIMIT_NOFILE is high enough, postmaster will fail in set_max_safe_fds() if started with a high max_connections. The cause of the failure is that, until now, set_max_safe_fds() subtracted the already open files from max_files_per_process. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/w6uiicyou7hzq47mbyejubtcyb2rngkkf45fk4q7inue5kfbeo@bbfad3qyubvs Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d=h9J=D1rStFCMR+t7yh_Kwj-g87aLQ@mail.gmail.com
2025-03-24Detect and Log multiple_unique_conflicts type conflict.Amit Kapila
Introduce a new conflict type, multiple_unique_conflicts, to handle cases where an incoming row during logical replication violates multiple UNIQUE constraints. Previously, the apply worker detected and reported only the first encountered key conflict (insert_exists/update_exists), causing repeated failures as each constraint violation needs to be handled one by one making the process slow and error-prone. With this patch, the apply worker checks all unique constraints upfront once the first key conflict is detected and reports multiple_unique_conflicts if multiple violations exist. This allows users to resolve all conflicts at once by deleting all conflicting tuples rather than dealing with them individually or skipping the transaction. In the future, this will also allow us to specify different resolution handlers for such a conflict type. Add the stats for this conflict type in pg_stat_subscription_stats. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com
2025-03-24Allow plugins to set a 64-bit plan identifier in PlannedStmtMichael Paquier
This field can be optionally set in a PlannedStmt through the planner hook, giving extensions the possibility to assign an identifier related to a computed plan. The backend is changed to report it in the backend entry of a process running (including the extended query protocol), with semantics and APIs to set or get it similar to what is used for the existing query ID (introduced in the backend via 4f0b0966c8). The plan ID is reset at the same timing as the query ID. Currently, this information is not added to the system view pg_stat_activity; extensions can access it through PgBackendStatus. Some patches have been proposed to provide some features in the planning area, where a plan identifier is used as a key to know the plan involved (for statistics, plan storage and manipulations, etc.), and the point of this commit is to provide an anchor in the backend that extensions can rely on for future work. The reset of the plan identifier is controlled by core and follows the same pattern as the query identifier added in 4f0b0966c8. The contents of this commit are extracted from a larger set proposed originally by Lukas Fittl, that Sami Imseih has proposed as an independent change, with a few tweaks sprinkled by me. Author: Lukas Fittl <lukas@fittl.com> Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com Discussion: https://postgr.es/m/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw@mail.gmail.com
2025-03-21Add GUC option to control maximum active replication origins.Masahiko Sawada
This commit introduces a new GUC option max_active_replication_origins to control the maximum number of active replication origins. Previously, this was controlled by 'max_replication_slots'. Having a separate GUC option provides better flexibility for setting up subscribers, as they may not require replication slots (for cascading replication) but always require replication origins. Author: Euler Taveira <euler@eulerto.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/b81db436-8262-4575-b7c4-bc0c1551000b@app.fastmail.com
2025-03-20Add vacuum_truncate configuration parameter.Nathan Bossart
This new parameter works just like the storage parameter of the same name: if set to true (which is the default), autovacuum and VACUUM attempt to truncate any empty pages at the end of the table. It is primarily intended to help users avoid locking issues on hot standbys. The setting can be overridden with the storage parameter or VACUUM's TRUNCATE option. Since there's presently no way to determine whether a Boolean storage parameter is explicitly set or has just picked up the default value, this commit also introduces an isset_offset member to relopt_parse_elt. Suggested-by: Will Storey <will@summercat.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
2025-03-19extension_control_pathPeter Eisentraut
The new GUC extension_control_path specifies a path to look for extension control files. The default value is $system, which looks in the compiled-in location, as before. The path search uses the same code and works in the same way as dynamic_library_path. Some use cases of this are: (1) testing extensions during package builds, (2) installing extensions outside security-restricted containers like Python.app (on macOS), (3) adding extensions to PostgreSQL running in a Kubernetes environment using operators such as CloudNativePG without having to rebuild the base image for each new extension. There is also a tweak in Makefile.global so that it is possible to install extensions using PGXS into an different directory than the default, using 'make install prefix=/else/where'. This previously only worked when specifying the subdirectories, like 'make install datadir=/else/where/share pkglibdir=/else/where/lib', for purely implementation reasons. (Of course, without the path feature, installing elsewhere was rarely useful.) Author: Peter Eisentraut <peter@eisentraut.org> Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: David E. Wheeler <david@justatheory.com> Reviewed-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
2025-03-19Increase io_combine_limit range to 1MB.Thomas Munro
The default of 128kB is unchanged, but the upper limit is changed from 32 blocks to 128 blocks, unless the operating system's IOV_MAX is too low. Some other RDBMSes seem to cap their multi-block buffer pool I/O around this number, and it seems useful to allow experimentation. The concrete change is to our definition of PG_IOV_MAX, which provides the maximum for io_combine_limit and io_max_combine_limit. It also affects a couple of other places that work with arrays of struct iovec or smaller objects on the stack, so we still don't want to use the system IOV_MAX directly without a clamp: it is not under our control and likely to be 1024. 128 seems acceptable for our current usage. For Windows, we can't use real scatter/gather yet, so we continue to define our own IOV_MAX value of 16 and emulate preadv()/pwritev() with loops. Someone would need to research the trade-offs of raising that number. NB if trying to see this working: you might temporarily need to hack BAS_BULKREAD to be bigger, since otherwise the obvious way of "a very big SELECT" is limited by that for now. Suggested-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19Introduce io_max_combine_limit.Thomas Munro
The existing io_combine_limit can be changed by users. The new io_max_combine_limit is fixed at server startup time, and functions as a silent clamp on the user setting. That in itself is probably quite useful, but the primary motivation is: aio_init.c allocates shared memory for all asynchronous IOs including some per-block data, and we didn't want to waste memory you'd never used by assuming they could be up to PG_IOV_MAX. This commit already halves the size of 'AioHandleIov' and 'AioHandleData'. A follow-up commit can now expand PG_IOV_MAX without affecting that. Since our GUC system doesn't support dependencies or cross-checks between GUCs, the user-settable one now assigns a "raw" value to io_combine_limit_guc, and the lower of io_combine_limit_guc and io_max_combine_limit is maintained in io_combine_limit. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19Fix copy-paste error related to the autovacuum launcher in pgstat_io.cMichael Paquier
Autovacuum launchers perform no WAL IO reads, but pgstat_tracks_io_op() was tracking them as an allowed combination for the "init" and "normal" contexts. This caused the "read", "read_bytes" and "read_time" attributes of pg_stat_io to show zeros for the autovacuum launcher rather than NULL. NULL means that a combination of IO object, IO context and IO operation has no meaning for a backend type. Zero is the same as telling that a combination is relevant, and that WAL reads are possible in an autovacuum launcher, but it is not relevant. Copy-pasto introduced in a051e71e28a1. Author: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAEudQAopEMAPiUqE7BvDV+x2fUPmKmb9RrsaoDR+hhQzLKg4PQ@mail.gmail.com
2025-03-19Optimize check for pending backend IO statsMichael Paquier
This commit changes the backend stats code so as we rely on a single boolean rather than a repeated check based on pg_memory_is_all_zeros() in the code, making it cheaper should PgStat_PendingIO get bigger in size. The frequency of backend stats reports is not a bottleneck, but there is no reason to not make that cheaper, and the logic is simple as the only entry points updating backend IO stats are pgstat_count_backend_io_op() and pgstat_count_backend_io_op_time(). Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-18Introduce squashing of constant lists in query jumblingÁlvaro Herrera
pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Tested-by: Yasuo Honda <yasuo.honda@gmail.com> Tested-by: Sergei Kornilov <sk@zsrv.org> Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com> Tested-by: Chengxi Sun <sunchengxi@highgo.com> Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18aio: Add io_method=workerAndres Freund
The previous commit introduced the infrastructure to start io_workers. This commit actually makes the workers execute IOs. IO workers consume IOs from a shared memory submission queue, run traditional synchronous system calls, and perform the shared completion handling immediately. Client code submits most requests by pushing IOs into the submission queue, and waits (if necessary) using condition variables. Some IOs cannot be performed in another process due to lack of infrastructure for reopening the file, and must processed synchronously by the client code when submitted. For now the default io_method is changed to "worker". We should re-evaluate that around beta1, we might want to be careful and set the default to "sync" for 18. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18aio: Infrastructure for io_method=workerAndres Freund
This commit contains the basic, system-wide, infrastructure for io_method=worker. It does not yet actually execute IO, this commit just provides the infrastructure for running IO workers, kept separate for easier review. The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually we'd like to make the number of workers dynamically scale up/down based on the current "IO load". To allow the number of IO workers to be increased without a restart, we need to reserve PGPROC entries for the workers unconditionally. This has been judged to be worth the cost. If it turns out to be problematic, we can introduce a PGC_POSTMASTER GUC to control the maximum number. As io workers might be needed during shutdown, e.g. for AIO during the shutdown checkpoint, a new PMState phase is added. IO workers are shut down after the shutdown checkpoint has been performed and walsender/archiver have shut down, but before the checkpointer itself shuts down. See also 87a6690cc69. Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18Add X25519 to the default set of curvesDaniel Gustafsson
Since many clients default to the X25519 curve in the TLS handshake, the fact that the server by defualt doesn't support it cause an extra roundtrip for each TLS connection. By adding multiple curves, which is supported since 3d1ef3a15c3eb68da, we can reduce the risk of extra roundtrips. Author: Daniel Gustafsson <daniel@yesql.se> Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/20240616234612.6cslu7nqexquvwj7@awork3.anarazel.de
2025-03-18Increase default maintenance_io_concurrency to 16Melanie Plageman
Since its introduction in fc34b0d9de27a, the default maintenance_io_concurrency has been larger than the default effective_io_concurrency. maintenance_io_concurrency primarily controlled prefetching done on behalf of the whole system, for operations like recovery. Therefore it makes sense for it to have a value equal to or greater than effective_io_concurrency, which controls I/O concurrency for reading a relation in a bitmap heap scan. ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase maintenance_io_concurrency as well. For now, though, we'll keep the defaults of effective_io_concurrency and maintenance_io_concurrency equal to one another (16). On fast, high IOPs systems, significantly higher values of maintenance_io_concurrency are observably beneficial [1]. However, such values would flood low IOPs systems and increase overall system I/O latency. It is worth mentioning that since 9256822608f and c3e775e608f, maintenance_io_concurrency also controls the I/O concurrency of each vacuum worker. Since many autovacuum workers may be simultaneously issuing I/Os, we want to keep maintenance_io_concurrency appropriately conservative. [1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com Suggested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
2025-03-18Add some opfamily support functions to lsyscache.cPeter Eisentraut
Add get_opfamily_method() and get_opfamily_member_for_cmptype() in lsyscache.c. No callers yet, but we'll add some soon. This is part of generalizing some parts of the code away from having btree hardcoded and use CompareType instead. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-17aio: Add core asynchronous I/O infrastructureAndres Freund
The main motivations to use AIO in PostgreSQL are: a) Reduce the time spent waiting for IO by issuing IO sufficiently early. In a few places we have approximated this using posix_fadvise() based prefetching, but that is fairly limited (no completion feedback, double the syscalls, only works with buffered IO, only works on some OSs). b) Allow to use Direct-I/O (DIO). DIO can offload most of the work for IO to hardware and thus increase throughput / decrease CPU utilization, as well as reduce latency. While we have gained the ability to configure DIO in d4e71df6, it is not yet usable for real world workloads, as every IO is executed synchronously. For portability, the new AIO infrastructure allows to implement AIO using different methods. The choice of the AIO method is controlled by the new io_method GUC. As of this commit, the only implemented method is "sync", i.e. AIO is not actually executed asynchronously. The "sync" method exists to allow to bypass most of the new code initially. Subsequent commits will introduce additional IO methods, including a cross-platform method implemented using worker processes and a linux specific method using io_uring. To allow different parts of postgres to use AIO, the core AIO infrastructure does not need to know what kind of files it is operating on. The necessary behavioral differences for different files are abstracted as "AIO Targets". One example target would be smgr. For boring portability reasons, all targets currently need to be added to an array in aio_target.c. This commit does not implement any AIO targets, just the infrastructure for them. The smgr target will be added in a later commit. Completion (and other events) of IOs for one type of file (i.e. one AIO target) need to be reacted to differently, based on the IO operation and the callsite. This is made possible by callbacks that can be registered on IOs. E.g. an smgr read into a local buffer does not need to update the corresponding BufferDesc (as there is none), but a read into shared buffers does. This commit does not contain any callbacks, they will be added in subsequent commits. For now the AIO infrastructure only understands READV and WRITEV operations, but it is expected that more operations will be added. E.g. fsync/fdatasync, flush_range and network operations like send/recv. As of this commit, nothing uses the AIO infrastructure. Later commits will add an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for read_stream.c IO, which, in one fell swoop, will convert all read stream users to AIO. The goal is to use AIO in many more places. There are patches to use AIO for checkpointer and bgwriter that are reasonably close to being ready. There also are prototypes to use it for WAL, relation extension, backend writes and many more. Those prototypes were important to ensure the design of the AIO subsystem is not too limiting (e.g. WAL writes need to happen in critical sections, which influenced a lot of the design). A future commit will add an AIO README explaining the AIO architecture and how to use the AIO subsystem. The README is added later, as it references details only added in later commits. Many many more people than the folks named below have contributed with feedback, work on semi-independent patches etc. E.g. various folks have contributed patches to use the read stream infrastructure (added by Thomas in b5a9b18cd0b) in more places. Similarly, a *lot* of folks have contributed to the CI infrastructure, which I had started to work on to make adding AIO feasible. Some of the work by contributors has gone into the "v1" prototype of AIO, which heavily influenced the current design of the AIO subsystem. None of the code from that directly survives, but without the prototype, the current version of the AIO infrastructure would not exist. Similarly, the reviewers below have not necessarily looked at the current design or the whole infrastructure, but have provided very valuable input. I am to blame for problems, not they. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Co-authored-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-17aio: Basic subsystem initializationAndres Freund
This commit just does the minimal wiring up of the AIO subsystem, added in the next commit, to the rest of the system. The next commit contains more details about motivation and architecture. This commit is kept separate to make it easier to review, separating the changes across the tree, from the implementation of the new subsystem. We discussed squashing this commit with the main commit before merging AIO, but there has been a mild preference for keeping it separate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17Revert "Add redo LSN to pgstats files"Michael Paquier
This reverts commit b860848232aa, that was added as a prerequisite for the support of pgstats data flush across checkpoints, linking a pgstats file to a specific checkpoint redo LSN. As reported, this is proving to be currently problematic when going through a pg_upgrade, that does direct manipulations of the control file in the new cluster. The LSN stored in the pgstats file is not able to cope with any changes done in the control file by pg_upgrade yet, causing the pgstats file to be discarded when starting the new cluster after overriding its redo LSN (one is a `pg_resetwal -l` where the new cluster's start LSN is bumped by a hardcoded value of 8 segments, see copy_xact_xlog_xid). The least painful path going forward is likely going to be a refactor of the pgstats code so as it is possible to read and write some of its data with some routines in src/common/, so as pg_upgrade or pg_resetwal are able to update its data. The main point is that we are going to need a LSN in the stats file should we make it written at checkpoint time and not only as part of a shutdown sequence. It is too late to dive into these details for v18, so let's revert the change, and let's try to figure out all the details in the next release cycle. The pgstats file is currently only written as part of a shutdown sequence, and its contents are still lost on crash, same as older releases. Bump PGSTAT_FILE_FORMAT_ID. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2563883.1741826489@sss.pgh.pa.us
2025-03-14Add GUC option to log lock acquisition failures.Fujii Masao
This commit introduces a new GUC, log_lock_failure, which controls whether a detailed log message is produced when a lock acquisition fails. Currently, it only supports logging lock failures caused by SELECT ... NOWAIT. The log message includes information about all processes holding or waiting for the lock that couldn't be acquired, helping users analyze and diagnose the causes of lock failures. Currently, this option does not log failures from SELECT ... SKIP LOCKED, as that could generate excessive log messages if many locks are skipped, causing unnecessary noise. This mechanism can be extended in the future to support for logging lock failures from other commands, such as LOCK TABLE ... NOWAIT. Author: Yuki Seino <seinoyu@oss.nttdata.com> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54131@oss.nttdata.com
2025-03-14Simplify and generalize PrepareSortSupportFromIndexRel()Peter Eisentraut
PrepareSortSupportFromIndexRel() was accepting btree strategy numbers purely for the purpose of comparing it later against btree strategies to determine if the sort direction was forward or reverse. Change that. Instead, pass a bool directly, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. (This is similar in spirit to commits 0d2aa4d4937 and c594f1ad2ba.) (This could arguably be simplfied further by having the callers fill in ssup_reverse directly. But this way, it preserves consistency by having all PrepareSortSupport*() variants be responsible for filling in ssup_reverse.) Moreover, remove the hardcoded check against BTREE_AM_OID, and check against amcanorder instead, which is the actual requirement. Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-13Fix ARRAY_SUBLINK and ARRAY[] for int2vector and oidvector input.Tom Lane
If the given input_type yields valid results from both get_element_type and get_array_type, initArrayResultAny believed the former and treated the input as an array type. However this is inconsistent with what get_promoted_array_type does, leading to situations where the output of an ARRAY() subquery is labeled with the wrong type: it's labeled as oidvector[] but is really a 2-D array of OID. That at least results in strange output, and can result in crashes if further processing such as unnest() is applied. AFAIK this is only possible with the int2vector and oidvector types, which are special-cased to be treated mostly as true arrays even though they aren't quite. Fix by switching the logic to match get_promoted_array_type by testing get_array_type not get_element_type, and remove an Assert thereby made pointless. (We need not introduce a symmetrical check for get_element_type in the other if-branch, because initArrayResultArr will check it.) This restores the behavior that existed before bac27394a introduced initArrayResultAny: the output really is int2vector[] or oidvector[]. Comparable confusion exists when an input of an ARRAY[] construct is int2vector or oidvector: transformArrayExpr decides it's dealing with a multidimensional array constructor, and we end up with something that's a multidimensional OID array but is alleged to be of type oidvector. I have not found a crashing case here, but it's easy to demonstrate totally-wrong results. Adjust that code so that what you get is an oidvector[] instead, for consistency with ARRAY() subqueries. (This change also makes these types work like domains-over-arrays in this context, which seems correct.) Bug: #18840 Reported-by: yang lei <ylshiyu@126.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18840-fbc9505f066e50d6@postgresql.org Backpatch-through: 13
2025-03-13Add reverse(bytea).Nathan Bossart
This commit introduces a function for reversing the order of the bytes in binary strings. Bumps catversion. Author: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://postgr.es/m/CAJ7c6TMe0QVRuNssUArbMi0bJJK32%2BzNA3at5m3osrBQ25MHuw%40mail.gmail.com
2025-03-13pg_noreturn to replace pg_attribute_noreturn()Peter Eisentraut
We want to support a "noreturn" decoration on more compilers besides just GCC-compatible ones, but for that we need to move the decoration in front of the function declaration instead of either behind it or wherever, which is the current style afforded by GCC-style attributes. Also rename the macro to "pg_noreturn" to be similar to the C11 standard "noreturn". pg_noreturn is now supported on all compilers that support C11 (using _Noreturn), as well as GCC-compatible ones (using __attribute__, as before), as well as MSVC (using __declspec). (When PostgreSQL requires C11, the latter two variants can be dropped.) Now, all supported compilers effectively support pg_noreturn, so the extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped. This also fixes a possible problem if third-party code includes stdnoreturn.h, because then the current definition of #define pg_attribute_noreturn() __attribute__((noreturn)) would cause an error. Note that the C standard does not support a noreturn attribute on function pointer types. So we have to drop these here. There are only two instances at this time, so it's not a big loss. In one case, we can make up for it by adding the pg_noreturn to a wrapper function and adding a pg_unreachable(), in the other case, the latter was already done before. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13Avoid invalidating all RelationSyncCache entries on publication rename.Amit Kapila
On Publication rename, we need to only invalidate the RelationSyncCache entries corresponding to relations that are part of the publication being renamed. As part of this patch, we introduce a new invalidation message to invalidate the cache maintained by the logical decoding output plugin. We can't use existing relcache invalidation for this purpose, as that would unnecessarily cause relcache invalidations in other backends. This will improve performance by building fewer relation cache entries during logical replication. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-13Fix copy-paste error in datum_to_jsonb_internal()Amit Langote
Commit 3c152a27b06 mistakenly repeated JSONTYPE_JSON in a condition, omitting JSONTYPE_CAST. As a result, datum_to_jsonb_internal() failed to reject inputs that were casts (e.g., from an enum to json as in the example below) when used as keys in JSON constructors. This led to a crash in cases like: SELECT JSON_OBJECT('happy'::mood: '123'::jsonb); where 'happy'::mood is implicitly cast to json. The missing check meant such casted values weren’t properly rejected as invalid (non-scalar) JSON keys. Reported-by: Maciek Sakrejda <maciek@pganalyze.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Maciek Sakrejda <maciek@pganalyze.com> Discussion: https://postgr.es/m/CADXhmgTJtJZK9A3Na_ry+Xrq-ghjcejBRhcRMzWZvbd__QdgJA@mail.gmail.com Backpatch-through: 17
2025-03-12Remove count_one_bits() in acl.c.Nathan Bossart
The only caller, select_best_grantor(), can instead use pg_popcount64(). This isn't performance-critical code, but we might as well use the centralized implementation. While at it, add some test coverage for this part of select_best_grantor(). Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/Z9GtL7Nm6hsYyJnF%40nathan
2025-03-12Increase default effective_io_concurrency to 16Melanie Plageman
The default effective_io_concurrency has been 1 since it was introduced in b7b8f0b6096d2ab6e. Referencing the associated discussion [1], it seems 1 was chosen as a conservative value that seemed unlikely to cause regressions. Experimentation on high latency cloud storage as well as fast, local nvme storage (see Discussion link) shows that even slightly higher values improve query timings substantially. 1 actually performs worse than 0 [2]. With effective_io_concurrency 1, we are not prefetching enough to avoid I/O stalls, but we are issuing extra syscalls. The new default is 16, which should be more appropriate for common hardware while still avoiding flooding low IOPs devices with I/O requests. [1] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com [2] https://www.postgresql.org/message-id/CAAKRu_Zv08Cic%3DqdCfzrQabpEXGrd9Z9UOW5svEVkCM6%3DFXA9g%40mail.gmail.com Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_Z%2BJa-mwXebOoOERMMUMvJeRhzTjad4dSThxG0JLXESxw%40mail.gmail.com
2025-03-12Add connection establishment duration loggingMelanie Plageman
Add log_connections option 'setup_durations' which logs durations of several key parts of connection establishment and backend setup. For an incoming connection, starting from when the postmaster gets a socket from accept() and ending when the forked child backend is first ready for query, there are multiple steps that could each take longer than expected due to external factors. This logging provides visibility into authentication and fork duration as well as the end-to-end connection establishment and backend initialization time. To make this portable, the timings captured in the postmaster (socket creation time, fork initiation time) are passed through the BackendStartupData. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com> Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12Modularize log_connections outputMelanie Plageman
Convert the boolean log_connections GUC into a list GUC comprised of the connection aspects to log. This gives users more control over the volume and kind of connection logging. The current log_connections options are 'receipt', 'authentication', and 'authorization'. The empty string disables all connection logging. 'all' enables all available connection logging. For backwards compatibility, the most common values for the log_connections boolean are still supported (on, off, 1, 0, true, false, yes, no). Note that previously supported substrings of on, off, true, false, yes, and no are no longer supported. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12Remove initialization from PendingBackendStatsMichael Paquier
9a8dd2c5a6d has added an initialization to PendingBackendStats, which has been causing compilation warnings in the buildfarm. This code does not strictly require it as PendingBackendStats is always initialized with memset(0), so let's remove it. Per report from multiple buildfarm members, like ayu and batfish, via Tom Lane. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/1870853.1741749264@sss.pgh.pa.us
2025-03-11Improve snapmgr.c commentHeikki Linnakangas
Add more details on the different kinds of snapshots, how to use them, and how the active snapshot stack works. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11Assert that a snapshot is active or registered before it's usedHeikki Linnakangas
The comment in GetTransactionSnapshot() said that you "should call RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be used very long". That felt too unclear to me. Make the comment more strongly worded. To enforce that rule and to catch potential bugs where a snapshot might get invalidated while it's still in use, add an assertion to HeapTupleSatisfiesMVCC() to check that the snapshot is registered or pushed to active stack. No new bugs were found by this, but it seems like good future-proofing. It's not a great place for the check; HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered snapshot, and the assertion won't catch other unsafe uses. But it goes a long way in practice. Fix a few cases that were playing fast and loose with that and just assumed that the snapshot cannot be invalidated during a scan. Those assumptions were not wrong, but they're not performance critical, so let's drop the excuses and just register the snapshot. These were false positives found by the new assertion. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11Improve EXPLAIN's display of window functions.Tom Lane
Up to now we just punted on showing the window definitions used in a plan, with window function calls represented as "OVER (?)". To improve that, show the window definition implemented by each WindowAgg plan node, and reference their window names in OVER. For nameless window clauses generated by "OVER (...)", assign unique names w1, w2, etc. In passing, re-order the properties shown for a WindowAgg node so that the Run Condition (if any) appears after the Window property and before the Filter (if any). This seems more sensible since the Run Condition is associated with the Window and acts before the Filter. Thanks to David G. Johnston and Álvaro Herrera for design suggestions. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11Show index search count in EXPLAIN ANALYZE, take 2.Peter Geoghegan
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit 5ead85fb (though that was quickly reverted by commit d00107cd following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit 5a1e6df3, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11Add WAL data to backend statisticsMichael Paquier
This commit adds per-backend WAL statistics, providing the same information as pg_stat_wal, except that it is now possible to know how much WAL activity is happening in each backend rather than an overall aggregate of all the activity. Like pg_stat_wal, the implementation relies on pgWalUsage, tracking the difference of activity between two reports to pgstats. This data can be retrieved with a new system function called pg_stat_get_backend_wal(), that returns one tuple based on the PID provided in input. Like pg_stat_get_backend_io(), this is useful when joined with pg_stat_activity to get a live picture of the WAL generated for each running backend, showing how the activity is [un]balanced. pgstat_flush_backend() gains a new flag value, able to control the flush of the WAL stats. This commit relies mostly on the infrastructure provided by 9aea73fc61d4, that has introduced backend statistics. Bump catalog version. A bump of PGSTAT_FILE_FORMAT_ID is not required, as backend stats do not persist on disk. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-10Use extended stats for precise estimation of bucket size in hash joinAlexander Korotkov
Recognizing the real-life complexity where columns in the table often have functional dependencies, PostgreSQL's estimation of the number of distinct values over a set of columns can be underestimated (or much rarely, overestimated) when dealing with multi-clause JOIN. In the case of hash join, it can end up with a small number of predicted hash buckets and, as a result, picking non-optimal merge join. To improve the situation, we introduce one additional stage of bucket size estimation - having two or more join clauses estimator lookup for extended statistics and use it for multicolumn estimation. Clauses are grouped into lists, each containing expressions referencing the same relation. The result of the multicolumn estimation made over such a list is combined with others according to the caller's logic. Clauses that are not estimated are returned to the caller for further estimation. Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-08Make parallel nbtree index scans use an LWLock.Peter Geoghegan
Teach parallel nbtree index scans to use an LWLock (not a spinlock) to protect the scan's shared descriptor state. Preparation for an upcoming patch that will add skip scan optimizations to nbtree. That patch will create the need to occasionally allocate memory while the scan descriptor is locked, while copying datums that were serialized by another backend. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-08Make amcanorder independent of amconsistentorderingPeter Eisentraut
Follow-up to commit af4002b381d: Make amconsistentordering not depend on amcanorder. Although they are related, they are independent properties. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-08Improve check for detection of pending data in backend statisticsMichael Paquier
The callback pgstat_backend_have_pending_cb() is used as a way for pg_stat_report() to detect if there is any pending data for backend statistics. It did not include a check based on pgstat_tracks_backend_bktype(), that discards processes whose backend types do not support backend statistics. The logic is not a problem on HEAD, as processes that do not support backend statistics cannot touch PendingBackendStats, so the callback would always report that there is no pending data in this case. However, we would run into trouble once backend statistics include portions of pending stats that are not always zeroed, like pgWalUsage. There is no reason for pgstat_backend_have_pending_cb() to not check for pgstat_tracks_backend_bktype(), anyway, and this pattern is safer in the long run, so let's update the code to do so. While on it, this commit adds a proper initialization to PendingBackendStats. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z8l6EMM4ImVoWRkg@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-07Improve possible performance regressionPeter Eisentraut
Commit ce62f2f2a0a introduced calls to GetIndexAmRoutineByAmId() in lsyscache.c functions. This call is a bit more expensive than a simple syscache lookup. So rearrange the nesting so that we call that one last and do the cheaper checks first. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07Rename amcancrosscomparePeter Eisentraut
After more discussion about commit ce62f2f2a0a, rename the index AM property amcancrosscompare to two separate properties amconsistentequality and amconsistentordering. Also improve the documentation and update some comments that were previously missed. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07Allow casting between bytea and integer types.Dean Rasheed
This allows smallint, integer, and bigint values to be cast to and from bytea. The bytea value is the two's complement representation of the integer, with the most significant byte first. For example: 1234::bytea -> \x000004d2 (-1234)::bytea -> \xfffffb2e Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
2025-03-06Split WaitEventSet functions to separate source fileHeikki Linnakangas
latch.c now only contains the Latch related functions, which build on the WaitEventSet abstraction. Most of the platform-dependent stuff is now in waiteventset.c. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-05Allow json{b}_strip_nulls to remove null array elementsAndrew Dunstan
An additional paramater ("strip_in_arrays") is added to these functions. It defaults to false. If true, then null array elements are removed as well as null valued object fields. JSON that just consists of a single null is not affected. Author: Florents Tselai <florents.tselai@gmail.com> Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com
2025-03-05Fix some gaps in pg_stat_io with WAL receiver and WAL summarizerMichael Paquier
The WAL receiver and WAL summarizer processes gain each one a call to pgstat_report_wal(), to make sure that they report their WAL statistics to pgstats, gathering data for pg_stat_io. In the WAL receiver, the stats reports are timed with status updates sent to the primary, that depend on wal_receiver_status_interval and wal_receiver_timeout. This is a conservative choice, but perhaps we could be more aggressive with the frequency of the stats reports. An interesting historical fact is that the WAL receiver does writes and syncs of WAL, but it has never reported its statistics to pgstats in pg_stat_wal. In the WAL summarizer, the stats reports are done each time the process waits for WAL. While on it, pg_stat_io is adjusted so as these two processes do not report any rows when IOObject is not WAL, making the view easier to use with less rows. Two tests are added in TAP, checking statistics for the WAL summarizer and the WAL receiver. Status updates in the WAL receiver are currently possible in the recovery test 001_stream_rep.pl. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz
2025-03-04Make FP_LOCK_SLOTS_PER_BACKEND look like a functionTomas Vondra
The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it depends on the max_locks_per_transaction GUC, and thus can change. This is non-obvious and confusing, so make it look more like a function by renaming it to FastPathLockSlotsPerBackend(). While at it, use the macro when initializing fast-path shared memory, instead of using the formula. Reported-by: Andres Freund Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett
2025-03-04Split pgstat_bestart() into three different routinesMichael Paquier
pgstat_bestart(), used post-authentication to set up a backend entry in the PgBackendStatus array, so as its data becomes visible in pg_stat_activity and related catalogs, has its logic divided into three routines with this commit, called in order at different steps of the backend initialization: * pgstat_bestart_initial() sets up the backend entry with a minimal amount of information, reporting it with a new BackendState called STATE_STARTING while waiting for backend initialization and client authentication to complete. The main benefit that this offers is observability, so as it is possible to monitor the backend activity during authentication. This step happens earlier than in the logic prior to this commit. pgstat_beinit() happens earlier as well, before authentication. * pgstat_bestart_security() reports the SSL/GSS status of the connection, once authentication completes. Auxiliary processes, for example, do not need to call this step, hence it is optional. This step is called after performing authentication, same as previously. * pgstat_bestart_final() reports the user and database IDs, takes the entry out of STATE_STARTING, and reports its application_name. This is called as the last step of the three, once authentication completes. An injection point is added, with a test checking that the "starting" phase of a backend entry is visible in pg_stat_activity. Some follow-up patches are planned to take advantage of this refactoring with more information provided in backend entries during authentication (LDAP hanging was a problem for the author, initially). Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com