summaryrefslogtreecommitdiff
path: root/src/backend/utils
AgeCommit message (Collapse)Author
15 hoursAdd information about "generation" when dropping twice pgstats entryHEADorigin/masterorigin/HEADmasterMichael Paquier
Dropping twice a pgstats entry should not happen, and the error report generated was missing the "generation" counter (tracking when an entry is reused) that has been added in 818119afccd3. Like d92573adcb02, backpatch down to v15 where this information is useful to have, to gather more information from instances where the problem shows up. A report has shown that this error path has been reached on a standby based on 17.3, for a relation stats entry and an OID close to wraparound. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAN4RuQvYth942J2+FcLmJKgdpq6fE5eqyFvb_PuskxF2eL=Wzg@mail.gmail.com Backpatch-through: 15
24 hoursExtend int128.h to support more numeric code.Dean Rasheed
This adds a few more functions to int128.h, allowing more of numeric.c to use 128-bit integers on all platforms. Specifically, int64_div_fast_to_numeric() and the following aggregate functions can now use 128-bit integers for improved performance on all platforms, rather than just platforms with native support for int128: - SUM(int8) - AVG(int8) - STDDEV_POP(int2 or int4) - STDDEV_SAMP(int2 or int4) - VAR_POP(int2 or int4) - VAR_SAMP(int2 or int4) In addition to improved performance on platforms lacking native 128-bit integer support, this significantly simplifies this numeric code by allowing a lot of conditionally compiled code to be deleted. A couple of numeric functions (div_var_int64() and sqrt_var()) still contain conditionally compiled 128-bit integer code that only works on platforms with native 128-bit integer support. Making those work more generally would require rolling our own higher precision 128-bit division, which isn't supported for now. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
37 hoursFix incorrect Datum conversion in timestamptz_trunc_internal()Michael Paquier
The code used a PG_RETURN_TIMESTAMPTZ() where the return type is TimestampTz and not a Datum. On 64-bit systems, there is no effect since this just ends up casting 64-bit integers back and forth. On 32-bit systems, timestamptz is pass-by-reference. PG_RETURN_TIMESTAMPTZ() allocates new memory and returns the address, meaning that the caller could interpret this as a timestamp value. The effect is using "date_trunc(..., 'infinity'::timestamptz) will return random values (instead of the correct return value 'infinity'). Bug introduced in commit d85ce012f99f. Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org Backpatch-through: 18
2 daysRemove INT64_HEX_FORMAT and UINT64_HEX_FORMATPeter Eisentraut
These were introduced (commit efdc7d74753) at the same time as we were moving to using the standard inttypes.h format macros (commit a0ed19e0a9e). It doesn't seem useful to keep a new already-deprecated interface like this with only a few users, so remove the new symbols again and have the callers use PRIx64. (Also, INT64_HEX_FORMAT was kind of a misnomer, since hex formats all use unsigned types.) Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0ac47b5d-e5ab-4cac-98a7-bdee0e2831e4%40eisentraut.org
3 daysFix varatt versus Datum type confusionsPeter Eisentraut
Macros like VARDATA() and VARSIZE() should be thought of as taking values of type pointer to struct varlena or some other related struct. The way they are implemented, you can pass anything to it and it will cast it right. But this is in principle incorrect. To fix, add the required DatumGetPointer() calls. Or in a couple of cases, remove superfluous PointerGetDatum() calls. It is planned in a subsequent patch to change macros like VARDATA() and VARSIZE() to inline functions, which will enforce stricter typing. This is in preparation for that. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
3 daysFix various hash function usesPeter Eisentraut
These instances were using Datum-returning functions where a lower-level function returning uint32 would be more appropriate. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
4 daysDetect and report update_deleted conflicts.Amit Kapila
This enhancement builds upon the infrastructure introduced in commit 228c370868, which enables the preservation of deleted tuples and their origin information on the subscriber. This capability is crucial for handling concurrent transactions replicated from remote nodes. The update introduces support for detecting update_deleted conflicts during the application of update operations on the subscriber. When an update operation fails to locate the target row-typically because it has been concurrently deleted-we perform an additional table scan. This scan uses the SnapshotAny mechanism and we do this additional scan only when the retain_dead_tuples option is enabled for the relevant subscription. The goal of this scan is to locate the most recently deleted tuple-matching the old column values from the remote update-that has not yet been removed by VACUUM and is still visible according to our slot (i.e., its deletion is not older than conflict-detection-slot's xmin). If such a tuple is found, the system reports an update_deleted conflict, including the origin and transaction details responsible for the deletion. This provides a groundwork for more robust and accurate conflict resolution process, preventing unexpected behavior by correctly identifying cases where a remote update clashes with a deletion from another origin. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
5 daysTake a little more care in set_backtrace().Tom Lane
Coverity complained that the "errtrace" string is leaked if we return early because backtrace_symbols fails. Another criticism that could be leveled at this is that not providing any hint of what happened is user-unfriendly. Fix that. The odds of a leak here are small, and typically it wouldn't matter anyway since the leak will be in ErrorContext which will soon get reset. So I'm not feeling a need to back-patch.
6 daysSilence complaints about leaks in PlanCacheComputeResultDesc.Tom Lane
CompleteCachedPlan intentionally doesn't worry about small leaks from PlanCacheComputeResultDesc. However, Valgrind knows nothing of engineering tradeoffs and complains anyway. Silence it by doing things the hard way if USE_VALGRIND. I don't really love this patch, because it makes the handling of plansource->resultDesc different from the handling of query dependencies and search_path just above, which likewise are willing to accept small leaks into the cached plan's context. However, those cases aren't provoking Valgrind complaints. (Perhaps in a CLOBBER_CACHE_ALWAYS build, they would?) For the moment, this makes the src/pl/plpgsql tests leak-free according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
6 daysSuppress complaints about leaks in TS dictionary loading.Tom Lane
Like the situation with function cache loading, text search dictionary loading functions tend to leak some cruft into the dictionary's long-lived cache context. To judge by the examples in the core regression tests, not very many bytes are at stake. Moreover, I don't see a way to prevent such leaks without changing the API for TS template initialization functions: right now they do not have to worry about making sure that their results are long-lived. Hence, I think we should install a suppression rule rather than trying to fix this completely. However, I did grab some low-hanging fruit: several places were leaking the result of get_tsearch_config_filename. This seems worth doing mostly because they are inconsistent with other dictionaries that were freeing it already. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
6 daysSilence Valgrind leakage complaints in more-or-less-hackish ways.Tom Lane
These changes don't actually fix any leaks. They just make sure that Valgrind will find pointers to data structures that remain allocated at process exit, and thus not falsely complain about leaks. In particular, we are trying to avoid situations where there is no pointer to the beginning of an allocated block (except possibly within the block itself, which Valgrind won't count). * Because dynahash.c never frees hashtable storage except by deleting the whole hashtable context, it doesn't bother to track the individual blocks of elements allocated by element_alloc(). This results in "possibly lost" complaints from Valgrind except when the first element of each block is actively in use. (Otherwise it'll be on a freelist, but very likely only reachable via "interior pointers" within element blocks, which doesn't satisfy Valgrind.) To fix, if we're building with USE_VALGRIND, expend an extra pointer's worth of space in each element block so that we can chain them all together from the HTAB header. Skip this in shared hashtables though: Valgrind doesn't track those, and we'd need additional locking to make it safe to manipulate a shared chain. While here, update a comment obsoleted by 9c911ec06. * Put the dlist_node fields of catctup and catclist structs first. This ensures that the dlist pointers point to the starts of these palloc blocks, and thus that Valgrind won't consider them "possibly lost". * The postmaster's PMChild structs and the autovac launcher's avl_dbase structs also have the dlist_node-is-not-first problem, but putting it first still wouldn't silence the warning because we bulk-allocate those structs in an array, so that Valgrind sees a single allocation. Commonly the first array element will be pointed to only from some later element, so that the reference would be an interior pointer even if it pointed to the array start. (This is the same issue as for dynahash elements.) Since these are pretty simple data structures, I don't feel too bad about faking out Valgrind by just keeping a static pointer to the array start. (This is all quite hacky, and it's not hard to imagine usages where we'd need some other idea in order to have reasonable leak tracking of structures that are only accessible via dlist_node lists. But these changes seem to be enough to silence this class of leakage complaints for the moment.) * Free a couple of data structures manually near the end of an autovacuum worker's run when USE_VALGRIND, and ensure that the final vac_update_datfrozenxid() call is done in a non-permanent context. This doesn't have any real effect on the process's total memory consumption, since we're going to exit as soon as that last transaction is done. But it does pacify Valgrind. * Valgrind complains about the postmaster's socket-files and lock-files lists being leaked, which we can silence by just not nulling out the static pointers to them. * Valgrind seems not to consider the global "environ" variable as a valid root pointer; so when we allocate a new environment array, it claims that data is leaked. To fix that, keep our own statically-allocated copy of the pointer, similarly to the previous item. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
6 daysFix assorted pretty-trivial memory leaks in the backend.Tom Lane
In the current system architecture, none of these are worth obsessing over; most are once-per-process leaks. However, Valgrind complains about all of them, and if we get to using threads rather than processes for backend sessions, it will become more interesting to avoid per-session leaks. * Fix leaks in StartupXLOG() and ShutdownWalRecovery(). * Fix leakage of pq_mq_handle in a parallel worker. While at it, move mq_putmessage's "Assert(pq_mq_handle != NULL)" to someplace where it's not trivially useless. * Fix leak in logicalrep_worker_detach(). * Don't leak the startup-packet buffer in ProcessStartupPacket(). * Fix leak in evtcache.c's DecodeTextArrayToBitmapset(). If the presented array is toasted, this neglected to free the detoasted copy, which was then leaked into EventTriggerCacheContext. * I'm distressed by the amount of code that BuildEventTriggerCache is willing to run while switched into a long-lived cache context. Although the detoasted array is the only leak that Valgrind reports, let's tighten things up while we're here. (DecodeTextArrayToBitmapset is still run in the cache context, so doing this doesn't remove the need for the detoast fix. But it reduces the surface area for other leaks.) * load_domaintype_info() intentionally leaked some intermediate cruft into the long-lived DomainConstraintCache's memory context, reasoning that the amount of leakage will typically not be much so it's not worth doing a copyObject() of the final tree to avoid that. But Valgrind knows nothing of engineering tradeoffs and complains anyway. On the whole, the copyObject doesn't cost that much and this is surely not a performance-critical code path, so let's do it the clean way. * MarkGUCPrefixReserved didn't bother to clean up removed placeholder GUCs at all, which shows up as a leak in one regression test. It seems appropriate for it to do as much cleanup as define_custom_variable does when replacing placeholders, so factor that code out into a helper function. define_custom_variable's logic was one brick shy of a load too: it forgot to free the separate allocation for the placeholder's name. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
6 daysFix MemoryContextAllocAligned's interaction with Valgrind.Tom Lane
Arrange that only the "aligned chunk" part of the allocated space is included in a Valgrind vchunk. This suppresses complaints about that vchunk being possibly lost because PG is retaining only pointers to the aligned chunk. Also make sure that trailing wasted space is marked NOACCESS. As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes only the returned "aligned chunk", not the wasted padding space. In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned instead of rolling its own implementation, which was equally broken according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
6 daysImprove our support for Valgrind's leak tracking.Tom Lane
When determining whether an allocated chunk is still reachable, Valgrind will consider only pointers within what it believes to be allocated chunks. Normally, all of a block obtained from malloc() would be considered "allocated" --- but it turns out that if we use VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed block as allocated, all the rest of that malloc'ed block is ignored. This leads to lots of false positives of course. In particular, in any multi-malloc-block context, all but the primary block were reported as leaked. We also had a problem with context "ident" strings, which were reported as leaked unless there was some other pointer to them besides the one in the context header. To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate a context's management structs (the context struct itself and any per-block headers) as allocated chunks. That forces moving the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into the per-context-type code, so that the pool identifier can be made as soon as we've allocated the initial block, but otherwise it's fairly straightforward. Note that in Valgrind's eyes there is no distinction between these allocations and the allocations that the mmgr modules hand out to user code. That's fine for now, but perhaps someday we'll want to do better yet. When reading this patch, it's helpful to start with the comments added at the head of mcxt.c. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de
7 daysAllow resetting unknown custom GUCs with reserved prefixes.Nathan Bossart
Currently, ALTER DATABASE/ROLE/SYSTEM RESET [ALL] with an unknown custom GUC with a prefix reserved by MarkGUCPrefixReserved() errors (unless a superuser runs a RESET ALL variant). This is problematic for cases such as an extension library upgrade that removes a GUC. To fix, simply make sure the relevant code paths explicitly allow it. Note that we require superuser or privileges on the parameter to reset it. This is perhaps a bit more restrictive than is necessary, but it's not clear whether further relaxing the requirements is safe. Oversight in commit 88103567cb. The ALTER SYSTEM fix is dependent on commit 2d870b4aef, which first appeared in v17. Unfortunately, back-patching that commit would introduce ABI breakage, and while that breakage seems unlikely to bother anyone, it doesn't seem worth the risk. Hence, the ALTER SYSTEM part of this commit is omitted on v15 and v16. Reported-by: Mert Alev <mert@futo.org> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Discussion: https://postgr.es/m/18964-ba09dea8c98fccd6%40postgresql.org Backpatch-through: 15
9 daysRename CachedPlanType to PlannedStmtOrigin for PlannedStmtMichael Paquier
Commit 719dcf3c42 introduced a field called CachedPlanType in PlannedStmt to allow extensions to determine whether a cached plan is generic or custom. After discussion, the concepts that we want to track are a bit wider than initially anticipated, as it is closer to knowing from which "source" or "origin" a PlannedStmt has been generated or retrieved. Custom and generic cached plans are a subset of that. Based on the state of HEAD, we have been able to define two more origins: - "standard", for the case where PlannedStmt is generated in standard_planner(), the most common case. - "internal", for the fake PlannedStmt generated internally by some query patterns. This could be tuned in the future depending on what is needed. This looks like a good starting point, at least. The default value is called "UNKNOWN", provided as fallback value. This value is not used in the core code, the idea is to let extensions building their own PlannedStmts know about this new field. Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aILaHupXbIGgF2wJ@paquier.xyz
10 daysUpdate commentPeter Eisentraut
The code being referred to was moved to a different function in commit eb8312a22a8, so update the comment accordingly.
10 daysRemove unnecessary complication around xmlParseBalancedChunkMemory.Tom Lane
When I prepared 71c0921b6 et al yesterday, I was thinking that the logic involving explicitly freeing the node_list output was still needed to dodge leakage bugs in libxml2. But I was misremembering: we introduced that only because with early 2.13.x releases we could not trust xmlParseBalancedChunkMemory's result code, so we had to look to see if a node list was returned or not. There's no reason to believe that xmlParseBalancedChunkMemory will fail to clean up the node list when required, so simplify. (This essentially completes reverting all the non-cosmetic changes in 6082b3d5d.) Reported-by: Jim Jones <jim.jones@uni-muenster.de> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/997668.1753802857@sss.pgh.pa.us Backpatch-through: 13
11 daysAvoid regression in the size of XML input that we will accept.Tom Lane
This mostly reverts commit 6082b3d5d, "Use xmlParseInNodeContext not xmlParseBalancedChunkMemory". It turns out that xmlParseInNodeContext will reject text chunks exceeding 10MB, while (in most libxml2 versions) xmlParseBalancedChunkMemory will not. The bleeding-edge libxml2 bug that we needed to work around a year ago is presumably no longer a factor, and the argument that xmlParseBalancedChunkMemory is semi-deprecated is not enough to justify a functionality regression. Hence, go back to doing it the old way. Reported-by: Michael Paquier <michael@paquier.xyz> Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Erik Wienhold <ewie@ewie.name> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aIGknLuc8b8ega2X@paquier.xyz Backpatch-through: 13
12 daysFix performance regression with flush of pending fixed-numbered statsMichael Paquier
The callback added in fc415edf8ca8 used to check if there is any pending data to flush for fixed-numbered statistics, done by looping across all the builtin and custom stats kinds with a call to have_fixed_pending_cb, is proving to able to show in workloads that do not report any stats (read-only, no function calls, no WAL, no IO, etc). The code used in v17 was cheaper than that what HEAD has introduced, relying on three boolean checks for WAL, SLRU and IO stats. This commit switches the code to use a more efficient approach than fc415edf8ca8, with a single boolean flag that can be switched to "true" by any fixed-numbered stats kinds to force pgstat_report_stat() to go through one round of reports. The flag is reset by pgstat_report_stat() once a full round of reports is done. The flag being false means that fixed-numbered stats kinds saw no activity, and that there is no pending data to flush. ac000fca743e took one step in improving the performance by reducing the number of stats kinds that the backend can hold. This commit takes a more drastic step by bringing back the code efficiency to what it was before v18 with a cheap check at the beginning of pgstat_report_stat() for its fast-exit path. The callback have_static_pending_cb is removed as an effect of all that. Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe Backpatch-through: 18
14 daysCreate infrastructure to reliably prevent leakage of PGresults.Tom Lane
Commit 232d8caea fixed a case where postgres_fdw could lose track of a PGresult object, resulting in a process-lifespan memory leak. But I have little faith that there aren't other potential PGresult leakages, now or in future, in the backend modules that use libpq. Therefore, this patch proposes infrastructure that makes all PGresults returned from libpq act as though they are palloc'd in the CurrentMemoryContext (with the option to relocate them to another context later). This should greatly reduce the risk of careless leaks, and it also permits removal of a bunch of code that attempted to prevent such leaks via PG_TRY blocks. This patch adds infrastructure that wraps each PGresult in a "libpqsrv_PGresult" that provides a memory context reset callback to PQclear the PGresult. Code using this abstraction is inherently memory-safe to the same extent as we are accustomed to in most backend code. Furthermore, we add some macros that automatically redirect calls of the libpq functions concerned with PGresults to use this infrastructure, so that almost no source-code changes are needed to wheel this infrastructure into place in all the backend code that uses libpq. Perhaps in future we could create similar infrastructure for PGconn objects, but there seems less need for that. This patch just creates the infrastructure and makes relevant code use it, including reverting 232d8caea in favor of this mechanism. A good deal of follow-on simplification is possible now that we don't have to be so cautious about freeing PGresults, but I'll put that in a separate patch. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
2025-07-25Fix dynahash's HASH_FIXED_SIZE ("isfixed") option.Tom Lane
This flag was effectively a no-op in EXEC_BACKEND (ie, Windows) builds, because it was kept in the process-local HTAB struct, and it could only ever become set in the postmaster's copy. The simplest fix is to move it to the shared HASHHDR struct. We could keep a copy in HTAB as well, as we do with keysize and some other fields, but the "too much contention" argument doesn't seem to apply here: we only examine isfixed during element_alloc(), which had better not get hit very often for a shared hashtable. This oversight dates to 7c797e719 which invented the option. But back-patching doesn't seem appropriate given the lack of field complaints. If there is anyone running an affected workload on Windows, they might be unhappy about the behavior changing in a minor release. Author: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/4d0cb35ff01c5c74d2b9a582ecb73823@postgrespro.ru
2025-07-24Introduce field tracking cached plan type in PlannedStmtMichael Paquier
PlannedStmt gains a new field, called CachedPlanType, able to track if a given plan tree originates from the cache and if we are dealing with a generic or custom cached plan. This field can be used for monitoring or statistical purposes, in the executor hooks, for example, based on the planned statement attached to a QueryDesc. A patch is under discussion for pg_stat_statements to provide an equivalent of the counters in pg_prepared_statements for custom and generic plans, to provide a more global view of such data, as this data is now restricted to the current session. The concept introduced in this commit is useful on its own, and has been extracted from a larger patch by the same author. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0uFw8Y9GCFvafhC=OA8NnMqVZyzXPfv_EePOt+iv1T-qQ@mail.gmail.com
2025-07-23Cross-check lists of built-in LWLock tranches.Nathan Bossart
lwlock.c, lwlock.h, and wait_event_names.txt each contain a list of built-in LWLock tranches. It is easy to miss one or the other when adding or removing tranches, and discrepancies have adverse effects (e.g., breaking JOINs between pg_stat_activity and pg_wait_events). This commit moves the lists of built-in tranches in lwlock.{c,h} to lwlocklist.h and adds a cross-check to the script that generates lwlocknames.h. If the lists do not match exactly, building will fail. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aHpOgwuFQfcFMZ/B%40ip-10-97-1-34.eu-west-3.compute.internal
2025-07-23Preserve conflict-relevant data during logical replication.Amit Kapila
Logical replication requires reliable conflict detection to maintain data consistency across nodes. To achieve this, we must prevent premature removal of tuples deleted by other origins and their associated commit_ts data by VACUUM, which could otherwise lead to incorrect conflict reporting and resolution. This patch introduces a mechanism to retain deleted tuples on the subscriber during the application of concurrent transactions from remote nodes. Retaining these tuples allows us to correctly ignore concurrent updates to the same tuple. Without this, an UPDATE might be misinterpreted as an INSERT during resolutions due to the absence of the original tuple. Additionally, we ensure that origin metadata is not prematurely removed by vacuum freeze, which is essential for detecting update_origin_differs and delete_origin_differs conflicts. To support this, a new replication slot named pg_conflict_detection is created and maintained by the launcher on the subscriber. Each apply worker tracks its own non-removable transaction ID, which the launcher aggregates to determine the appropriate xmin for the slot, thereby retaining necessary tuples. Conflict information retention (deleted tuples and commit_ts) can be enabled per subscription via the retain_conflict_info option. This is disabled by default to avoid unnecessary overhead for configurations that do not require conflict resolution or logging. During upgrades, if any subscription on the old cluster has retain_conflict_info enabled, a conflict detection slot will be created to protect relevant tuples from deletion when the new cluster starts. This is a foundational work to correctly detect update_deleted conflict which will be done in a follow-up patch. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-07-19Mostly-cosmetic adjustments to estimate_multivariate_bucketsize().Tom Lane
The only practical effect of these changes is to avoid a useless list_copy() operation when there is a single hashclause. That's never going to make any noticeable performance difference, but the code is arguably clearer this way, especially if we take the opportunity to add some comments so that readers don't have to reverse-engineer the usage of these local variables. Also add some braces for better/more consistent style. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHewXNnHBOO9NEa=NBDYOrwZL4oHu2NOcTYvqyNyWEswo8f5OQ@mail.gmail.com
2025-07-18Speed up byteain by not parsing traditional-style input twice.Tom Lane
Instead of laboriously computing the exact output length, use strlen to get an upper bound cheaply. (This is still O(N) of course, but the constant factor is a lot less.) This will typically result in overallocating the output datum, but that's of little concern since it's a short-lived allocation in just about all use-cases. A simple microbenchmark showed about 40% speedup for long input strings. While here, make some cosmetic cleanups and add a test case that covers the double-backslash code path in byteain and byteaout. Author: Steven Niu <niushiji@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Stepan Neretin <slpmcf@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/ca315729-140b-426e-81a6-6cd5cfe7ecc5@gmail.com
2025-07-16Force LC_COLLATE to C in postmaster.Jeff Davis
Avoid dependence on setlocale(). strcoll(), etc., are not called directly; all collation-sensitive calls should go through pg_locale.c and use the appropriate provider. By setting LC_COLLATE to C, we avoid accidentally depending on libc behavior when using a different provider. No behavior change in the backend, but it's possible that some extensions will be affected. Such extensions should be updated to use the pg_locale_t APIs. Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
2025-07-15Clarify the ra != rb case in compareJsonbContainers().Tom Lane
It's impossible to reach this case with either ra or rb being WJB_DONE, because our earlier checks that the structure and length of the inputs match should guarantee that we reach their ends simultaneously. However, the comment completely fails to explain this, and the Asserts don't cover it either. The comment is pretty obscure anyway, so rewrite it, and extend the Asserts to reject WJB_DONE. This is only cosmetic, so no need for back-patch. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru
2025-07-15Silence uninitialized-value warnings in compareJsonbContainers().Tom Lane
Because not every path through JsonbIteratorNext() sets val->type, some compilers complain that compareJsonbContainers() is comparing possibly-uninitialized values. The paths that don't set it return WJB_DONE, WJB_END_ARRAY, or WJB_END_OBJECT, so it's clear by manual inspection that the "(ra == rb)" code path is safe, and indeed we aren't seeing warnings about that. But the (ra != rb) case is much less obviously safe. In Assert-enabled builds it seems that the asserts rejecting WJB_END_ARRAY and WJB_END_OBJECT persuade gcc 15.x not to warn, which makes little sense because it's impossible to believe that the compiler can prove of its own accord that ra/rb aren't WJB_DONE here. (In fact they never will be, so the code isn't wrong, but why is there no warning?) Without Asserts, the appearance of warnings is quite unsurprising. We discussed fixing this by converting those two Asserts into pg_assume, but that seems not very satisfactory when it's so unclear why the compiler is or isn't warning: the warning could easily reappear with some other compiler version. Let's fix it in a less magical, more future-proof way by changing JsonbIteratorNext() so that it always does set val->type. The cost of that should be pretty negligible, and it makes the function's API spec less squishy. Reported-by: Erik Rijkers <er@xs4all.nl> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/988bf1bc-3f1f-99f3-bf98-222f1cd9dc5e@xs4all.nl Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru Backpatch-through: 13
2025-07-12Replace float8 with int in date2isoweek() and date2isoyear().Tom Lane
The values of the "result" variables in these functions are always integers; using a float8 variable accomplishes nothing except to incur useless conversions to and from float. While that wastes a few nanoseconds, these functions aren't all that time-critical. But it seems worth fixing to remove possible reader confusion. Also, in the case of date2isoyear(), "result" is a very poorly chosen variable name because it is *not* the function's result. Rename it to "week", and do the same in date2isoweek() for consistency. Since this is mostly cosmetic, there seems little need for back-patch. Author: Sergey Fukanchik <s.fukanchik@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6323a-68726500-1-7def9d00@137821581
2025-07-11Fix inconsistent quoting of role names in ACLs.Tom Lane
getid() and putid(), which parse and deparse role names within ACL input/output, applied isalnum() to see if a character within a role name requires quoting. They did this even for non-ASCII characters, which is problematic because the results would depend on encoding, locale, and perhaps even platform. So it's possible that putid() could elect not to quote some string that, later in some other environment, getid() will decide is not a valid identifier, causing dump/reload or similar failures. To fix this in a way that won't risk interoperability problems with unpatched versions, make getid() treat any non-ASCII as a legitimate identifier character (hence not requiring quotes), while making putid() treat any non-ASCII as requiring quoting. We could remove the resulting excess quoting once we feel that no unpatched servers remain in the wild, but that'll be years. A lesser problem is that getid() did the wrong thing with an input consisting of just two double quotes (""). That has to represent an empty string, but getid() read it as a single double quote instead. The case cannot arise in the normal course of events, since we don't allow empty-string role names. But let's fix it while we're here. Although we've not heard field reports of problems with non-ASCII role names, there's clearly a hazard there, so back-patch to all supported versions. Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3792884.1751492172@sss.pgh.pa.us Backpatch-through: 13
2025-07-11Fix the handling of two GUCs during upgrade.Amit Kapila
Previously, the check_hook functions for max_slot_wal_keep_size and idle_replication_slot_timeout would incorrectly raise an ERROR for values set in postgresql.conf during upgrade, even though those values were not actively used in the upgrade process. To prevent logical slot invalidation during upgrade, we used to set special values for these GUCs. Now, instead of relying on those values, we directly prevent WAL removal and logical slot invalidation caused by max_slot_wal_keep_size and idle_replication_slot_timeout. Note: PostgreSQL 17 does not include the idle_replication_slot_timeout GUC, so related changes were not backported. BUG #18979 Reported-by: jorsol <jorsol@gmail.com> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed by: vignesh C <vignesh21@gmail.com> Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
2025-07-11Change unit of idle_replication_slot_timeout to seconds.Fujii Masao
Previously, the idle_replication_slot_timeout parameter used minutes as its unit, based on the assumption that values would typically exceed one minute in production environments. However, this caused unexpected behavior: specifying a value below 30 seconds would round down to 0, effectively disabling the timeout. This could be surprising to users. To allow finer-grained control and avoid such confusion, this commit changes the unit of idle_replication_slot_timeout to seconds. Larger values can still be specified easily using standard time suffixes, for example, '24h' for 24 hours. Back-patch to v18 where idle_replication_slot_timeout was added. Reported-by: Gunnar Morling <gunnar.morling@googlemail.com> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com Backpatch-through: 18
2025-07-09Change wchar2char() and char2wchar() to accept a locale_t.Jeff Davis
These are libc-specific functions, so should require a locale_t rather than a pg_locale_t (which could use another provider). Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel%40j-davis.com
2025-07-08Fix low-probability memory leak in XMLSERIALIZE(... INDENT).Tom Lane
xmltotext_with_options() did not consider the possibility that pg_xml_init() could fail --- most likely due to OOM. If that happened, the already-parsed xmlDoc structure would be leaked. Oversight in commit 483bdb2af. Bug: #18981 Author: Dmitry Kovalenko <d.kovalenko@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18981-9bc3c80f107ae925@postgresql.org Backpatch-through: 16
2025-07-07Standardize LSN formatting by zero paddingÁlvaro Herrera
This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-07-07Integrate FullTransactionIds deeper into two-phase codeMichael Paquier
This refactoring is a follow-up of the work done in 5a1dfde8334b, that has switched 2PC file names to use FullTransactionIds when written on disk. This will help with the integration of a follow-up solution related to the handling of two-phase files during recovery, to address older defects while reading these from disk after a crash. This change is useful in itself as it reduces the need to build the file names from epoch numbers and TransactionIds, because we can use directly FullTransactionIds from which the 2PC file names are guessed. So this avoids a lot of back-and-forth between the FullTransactionIds retrieved from the file names and how these are passed around in the internal 2PC logic. Note that the core of the change is the use of a FullTransactionId instead of a TransactionId in GlobalTransactionData, that tracks 2PC file information in shared memory. The change in TwoPhaseCallback makes this commit unfit for stable branches. Noah has contributed a good chunk of this patch. I have spent some time on it as well while working on the issues with two-phase state files and recovery. Author: Noah Misch <noah@leadboat.com> Co-Authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z5sd5O9JO7NYNK-C@paquier.xyz Discussion: https://postgr.es/m/20250116205254.65.nmisch@google.com
2025-07-03Break out xxx2yyy_opt_overflow APIs for more datetime conversions.Tom Lane
Previous commits invented timestamp2timestamptz_opt_overflow, date2timestamp_opt_overflow, and date2timestamptz_opt_overflow functions to perform non-error-throwing conversions between datetime types. This patch completes the set by adding timestamp2date_opt_overflow, timestamptz2date_opt_overflow, and timestamptz2timestamp_opt_overflow. In addition, adjust timestamp2timestamptz_opt_overflow so that it doesn't throw error if timestamp2tm fails, but treats that as an overflow case. The situation probably can't arise except with an invalid timestamp value, and I can't think of a way that that would happen except data corruption. However, it's pretty silly to have a function whose entire reason for existence is to not throw errors for out-of-range inputs nonetheless throw an error for out-of-range input. The new APIs are not used in this patch, but will be needed in upcoming btree_gin changes. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://postgr.es/m/262624.1738460652@sss.pgh.pa.us
2025-07-03Add InjectionPointList() to retrieve list of injection pointsMichael Paquier
This routine has come as a useful piece to be able to know the list of injection points currently attached in a system. One area would be to use it in a set-returning function, or just let out-of-core code play with it. This hides the internals of the shared memory array lookup holding the information about the injection points (point name, library and function name), allocating the result in a palloc'd List consumable by the caller. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Discussion: https://postgr.es/m/Z_xYkA21KyLEHvWR@paquier.xyz Discussion: https://postgr.es/m/aBG2rPwl3GE7m1-Q@paquier.xyz
2025-07-02Add GetNamedDSA() and GetNamedDSHash().Nathan Bossart
Presently, the dynamic shared memory (DSM) registry only provides GetNamedDSMSegment(), which allocates a fixed-size segment. To use the DSM registry for more sophisticated things like dynamic shared memory areas (DSAs) or a hash table backed by a DSA (dshash), users need to create a DSM segment that stores various handles and LWLock tranche IDs and to write fairly complicated initialization code. Furthermore, there is likely little variation in this initialization code between libraries. This commit introduces functions that simplify allocating a DSA or dshash within the DSM registry. These functions are very similar to GetNamedDSMSegment(). Notable differences include the lack of an initialization callback parameter and the prohibition of calling the functions more than once for a given entry in each backend (which should be trivially avoidable in most circumstances). While at it, this commit bumps the maximum DSM registry entry name length from 63 bytes to 127 bytes. Also note that even though one could presumably detach/destroy the DSAs and dshashes created in the registry, such use-cases are not yet well-supported, if for no other reason than the associated DSM registry entries cannot be removed. Adding such support is left as a future exercise. The test_dsm_registry test module contains tests for the new functions and also serves as a complete usage example. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Florents Tselai <florents.tselai@gmail.com> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Discussion: https://postgr.es/m/aEC8HGy2tRQjZg_8%40nathan
2025-07-02Allow width_bucket()'s "operand" input to be NaN.Tom Lane
The array-based variant of width_bucket() has always accepted NaN inputs, treating them as equal but larger than any non-NaN, as we do in ordinary comparisons. But up to now, the four-argument variants threw errors for a NaN operand. This is inconsistent and unnecessary, since we can perfectly well regard NaN as falling after the last bucket. We do still throw error for NaN or infinity histogram-bound inputs, since there's no way to compute sensible bucket boundaries. Arguably this is a bug fix, but given the lack of field complaints I'm content to fix it in master. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/2822872.1750540911@sss.pgh.pa.us
2025-07-02Move code for the bytea data type from varlena.c to new bytea.cMichael Paquier
This commit moves all the routines related to the bytea data type into its own new file, called bytea.c, clearing some of the bloat in varlena.c. This includes the routines for: - Input, output, receive and send - Comparison - Casts to integer types - bytea-specific functions The internals of the routines moved here are unchanged, with one exception. This comes with a twist in bytea_string_agg_transfn(), where the call to makeStringAggState() is replaced by the internals of this routine, still located in varlena.c. This simplifies the move to the new file by not having to expose makeStringAggState(). Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAJ7c6TMPVPJ5DL447zDz5ydctB8OmuviURtSwd=PHCRFEPDEAQ@mail.gmail.com
2025-07-01Remove provider field from pg_locale_t.Jeff Davis
The behavior of pg_locale_t is specified by methods, so a separate provider field is no longer necessary. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
2025-07-01Control ctype behavior internally with a method table.Jeff Davis
Previously, pattern matching and case mapping behavior branched based on the provider. Refactor to use a method table, which is less error-prone. This is also a step toward multiple provider versions, which we may want to support in the future. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
2025-07-01Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.Jeff Davis
Avoids unnecessary dependence on setlocale(). No behavior change. This commit reverts e1458f2f1b, which reverted some changes unintentionally committed before the branch for 19. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel@j-davis.com Discussion: https://postgr.es/m/7efaaa645aa5df3771bb47b9c35df27e08f3520e.camel@j-davis.com
2025-07-01Improve error handling of libxml2 calls in xml.cMichael Paquier
This commit fixes some defects in the backend's xml.c, found upon inspection of the internals of libxml2: - xmlEncodeSpecialChars() can fail on malloc(), returning NULL back to the caller. xmltext() assumed that this could never happen. Like other code paths, a TRY/CATCH block is added there, covering also the fact that cstring_to_text_with_len() could fail a memory allocation, where the backend would miss to free the buffer allocated by xmlEncodeSpecialChars(). - Some libxml2 routines called in xmlelement() can return NULL, like xmlAddChildList() or xmlTextWriterStartElement(). Dedicated errors are added for them. - xml_xmlnodetoxmltype() missed that xmlXPathCastNodeToString() can fail on an allocation failure. In this case, the call can just be moved to the existing TRY/CATCH block. All these code paths would cause the server to crash. As this is unlikely a problem in practice, no backpatch is done. Jim and I have caught these defects, not sure who has scored the most. The contrib module xml2/ has similar defects, which will be addressed in a separate change. Reported-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/aEEingzOta_S_Nu7@paquier.xyz
2025-06-30Add new OID alias type regdatabase.Nathan Bossart
This provides a convenient way to look up a database's OID. For example, the query SELECT * FROM pg_shdepend WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database()); can now be simplified to SELECT * FROM pg_shdepend WHERE dbid = current_database()::regdatabase; Like the regrole type, regdatabase has cluster-wide scope, so we disallow regdatabase constants from appearing in stored expressions. Bumps catversion. Author: Ian Lawrence Barwick <barwick@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aBpjJhyHpM2LYcG0%40nathan
2025-06-30Remove unused #include's in src/backend/utils/adt/*Peter Eisentraut
Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAJ7c6TOowVbR-0NEvvDm6a_mag18krR0XJ2FKrc9DHXj7hFRtQ%40mail.gmail.com
2025-06-28Message style improvementsPeter Eisentraut