summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
22 hoursExtend int128.h to support more numeric code.Dean Rasheed
This adds a few more functions to int128.h, allowing more of numeric.c to use 128-bit integers on all platforms. Specifically, int64_div_fast_to_numeric() and the following aggregate functions can now use 128-bit integers for improved performance on all platforms, rather than just platforms with native support for int128: - SUM(int8) - AVG(int8) - STDDEV_POP(int2 or int4) - STDDEV_SAMP(int2 or int4) - VAR_POP(int2 or int4) - VAR_SAMP(int2 or int4) In addition to improved performance on platforms lacking native 128-bit integer support, this significantly simplifies this numeric code by allowing a lot of conditionally compiled code to be deleted. A couple of numeric functions (div_var_int64() and sqrt_var()) still contain conditionally compiled 128-bit integer code that only works on platforms with native 128-bit integer support. Making those work more generally would require rolling our own higher precision 128-bit division, which isn't supported for now. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
35 hoursFix incorrect Datum conversion in timestamptz_trunc_internal()Michael Paquier
The code used a PG_RETURN_TIMESTAMPTZ() where the return type is TimestampTz and not a Datum. On 64-bit systems, there is no effect since this just ends up casting 64-bit integers back and forth. On 32-bit systems, timestamptz is pass-by-reference. PG_RETURN_TIMESTAMPTZ() allocates new memory and returns the address, meaning that the caller could interpret this as a timestamp value. The effect is using "date_trunc(..., 'infinity'::timestamptz) will return random values (instead of the correct return value 'infinity'). Bug introduced in commit d85ce012f99f. Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org Backpatch-through: 18
3 daysFix varatt versus Datum type confusionsPeter Eisentraut
Macros like VARDATA() and VARSIZE() should be thought of as taking values of type pointer to struct varlena or some other related struct. The way they are implemented, you can pass anything to it and it will cast it right. But this is in principle incorrect. To fix, add the required DatumGetPointer() calls. Or in a couple of cases, remove superfluous PointerGetDatum() calls. It is planned in a subsequent patch to change macros like VARDATA() and VARSIZE() to inline functions, which will enforce stricter typing. This is in preparation for that. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
3 daysFix various hash function usesPeter Eisentraut
These instances were using Datum-returning functions where a lower-level function returning uint32 would be more appropriate. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
4 daysDetect and report update_deleted conflicts.Amit Kapila
This enhancement builds upon the infrastructure introduced in commit 228c370868, which enables the preservation of deleted tuples and their origin information on the subscriber. This capability is crucial for handling concurrent transactions replicated from remote nodes. The update introduces support for detecting update_deleted conflicts during the application of update operations on the subscriber. When an update operation fails to locate the target row-typically because it has been concurrently deleted-we perform an additional table scan. This scan uses the SnapshotAny mechanism and we do this additional scan only when the retain_dead_tuples option is enabled for the relevant subscription. The goal of this scan is to locate the most recently deleted tuple-matching the old column values from the remote update-that has not yet been removed by VACUUM and is still visible according to our slot (i.e., its deletion is not older than conflict-detection-slot's xmin). If such a tuple is found, the system reports an update_deleted conflict, including the origin and transaction details responsible for the deletion. This provides a groundwork for more robust and accurate conflict resolution process, preventing unexpected behavior by correctly identifying cases where a remote update clashes with a deletion from another origin. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
10 daysUpdate commentPeter Eisentraut
The code being referred to was moved to a different function in commit eb8312a22a8, so update the comment accordingly.
10 daysRemove unnecessary complication around xmlParseBalancedChunkMemory.Tom Lane
When I prepared 71c0921b6 et al yesterday, I was thinking that the logic involving explicitly freeing the node_list output was still needed to dodge leakage bugs in libxml2. But I was misremembering: we introduced that only because with early 2.13.x releases we could not trust xmlParseBalancedChunkMemory's result code, so we had to look to see if a node list was returned or not. There's no reason to believe that xmlParseBalancedChunkMemory will fail to clean up the node list when required, so simplify. (This essentially completes reverting all the non-cosmetic changes in 6082b3d5d.) Reported-by: Jim Jones <jim.jones@uni-muenster.de> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/997668.1753802857@sss.pgh.pa.us Backpatch-through: 13
11 daysAvoid regression in the size of XML input that we will accept.Tom Lane
This mostly reverts commit 6082b3d5d, "Use xmlParseInNodeContext not xmlParseBalancedChunkMemory". It turns out that xmlParseInNodeContext will reject text chunks exceeding 10MB, while (in most libxml2 versions) xmlParseBalancedChunkMemory will not. The bleeding-edge libxml2 bug that we needed to work around a year ago is presumably no longer a factor, and the argument that xmlParseBalancedChunkMemory is semi-deprecated is not enough to justify a functionality regression. Hence, go back to doing it the old way. Reported-by: Michael Paquier <michael@paquier.xyz> Author: Michael Paquier <michael@paquier.xyz> Co-authored-by: Erik Wienhold <ewie@ewie.name> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aIGknLuc8b8ega2X@paquier.xyz Backpatch-through: 13
2025-07-23Preserve conflict-relevant data during logical replication.Amit Kapila
Logical replication requires reliable conflict detection to maintain data consistency across nodes. To achieve this, we must prevent premature removal of tuples deleted by other origins and their associated commit_ts data by VACUUM, which could otherwise lead to incorrect conflict reporting and resolution. This patch introduces a mechanism to retain deleted tuples on the subscriber during the application of concurrent transactions from remote nodes. Retaining these tuples allows us to correctly ignore concurrent updates to the same tuple. Without this, an UPDATE might be misinterpreted as an INSERT during resolutions due to the absence of the original tuple. Additionally, we ensure that origin metadata is not prematurely removed by vacuum freeze, which is essential for detecting update_origin_differs and delete_origin_differs conflicts. To support this, a new replication slot named pg_conflict_detection is created and maintained by the launcher on the subscriber. Each apply worker tracks its own non-removable transaction ID, which the launcher aggregates to determine the appropriate xmin for the slot, thereby retaining necessary tuples. Conflict information retention (deleted tuples and commit_ts) can be enabled per subscription via the retain_conflict_info option. This is disabled by default to avoid unnecessary overhead for configurations that do not require conflict resolution or logging. During upgrades, if any subscription on the old cluster has retain_conflict_info enabled, a conflict detection slot will be created to protect relevant tuples from deletion when the new cluster starts. This is a foundational work to correctly detect update_deleted conflict which will be done in a follow-up patch. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-07-19Mostly-cosmetic adjustments to estimate_multivariate_bucketsize().Tom Lane
The only practical effect of these changes is to avoid a useless list_copy() operation when there is a single hashclause. That's never going to make any noticeable performance difference, but the code is arguably clearer this way, especially if we take the opportunity to add some comments so that readers don't have to reverse-engineer the usage of these local variables. Also add some braces for better/more consistent style. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAHewXNnHBOO9NEa=NBDYOrwZL4oHu2NOcTYvqyNyWEswo8f5OQ@mail.gmail.com
2025-07-18Speed up byteain by not parsing traditional-style input twice.Tom Lane
Instead of laboriously computing the exact output length, use strlen to get an upper bound cheaply. (This is still O(N) of course, but the constant factor is a lot less.) This will typically result in overallocating the output datum, but that's of little concern since it's a short-lived allocation in just about all use-cases. A simple microbenchmark showed about 40% speedup for long input strings. While here, make some cosmetic cleanups and add a test case that covers the double-backslash code path in byteain and byteaout. Author: Steven Niu <niushiji@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Stepan Neretin <slpmcf@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/ca315729-140b-426e-81a6-6cd5cfe7ecc5@gmail.com
2025-07-15Clarify the ra != rb case in compareJsonbContainers().Tom Lane
It's impossible to reach this case with either ra or rb being WJB_DONE, because our earlier checks that the structure and length of the inputs match should guarantee that we reach their ends simultaneously. However, the comment completely fails to explain this, and the Asserts don't cover it either. The comment is pretty obscure anyway, so rewrite it, and extend the Asserts to reject WJB_DONE. This is only cosmetic, so no need for back-patch. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru
2025-07-15Silence uninitialized-value warnings in compareJsonbContainers().Tom Lane
Because not every path through JsonbIteratorNext() sets val->type, some compilers complain that compareJsonbContainers() is comparing possibly-uninitialized values. The paths that don't set it return WJB_DONE, WJB_END_ARRAY, or WJB_END_OBJECT, so it's clear by manual inspection that the "(ra == rb)" code path is safe, and indeed we aren't seeing warnings about that. But the (ra != rb) case is much less obviously safe. In Assert-enabled builds it seems that the asserts rejecting WJB_END_ARRAY and WJB_END_OBJECT persuade gcc 15.x not to warn, which makes little sense because it's impossible to believe that the compiler can prove of its own accord that ra/rb aren't WJB_DONE here. (In fact they never will be, so the code isn't wrong, but why is there no warning?) Without Asserts, the appearance of warnings is quite unsurprising. We discussed fixing this by converting those two Asserts into pg_assume, but that seems not very satisfactory when it's so unclear why the compiler is or isn't warning: the warning could easily reappear with some other compiler version. Let's fix it in a less magical, more future-proof way by changing JsonbIteratorNext() so that it always does set val->type. The cost of that should be pretty negligible, and it makes the function's API spec less squishy. Reported-by: Erik Rijkers <er@xs4all.nl> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/988bf1bc-3f1f-99f3-bf98-222f1cd9dc5e@xs4all.nl Discussion: https://postgr.es/m/0c623e8a204187b87b4736792398eaf1@postgrespro.ru Backpatch-through: 13
2025-07-12Replace float8 with int in date2isoweek() and date2isoyear().Tom Lane
The values of the "result" variables in these functions are always integers; using a float8 variable accomplishes nothing except to incur useless conversions to and from float. While that wastes a few nanoseconds, these functions aren't all that time-critical. But it seems worth fixing to remove possible reader confusion. Also, in the case of date2isoyear(), "result" is a very poorly chosen variable name because it is *not* the function's result. Rename it to "week", and do the same in date2isoweek() for consistency. Since this is mostly cosmetic, there seems little need for back-patch. Author: Sergey Fukanchik <s.fukanchik@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6323a-68726500-1-7def9d00@137821581
2025-07-11Fix inconsistent quoting of role names in ACLs.Tom Lane
getid() and putid(), which parse and deparse role names within ACL input/output, applied isalnum() to see if a character within a role name requires quoting. They did this even for non-ASCII characters, which is problematic because the results would depend on encoding, locale, and perhaps even platform. So it's possible that putid() could elect not to quote some string that, later in some other environment, getid() will decide is not a valid identifier, causing dump/reload or similar failures. To fix this in a way that won't risk interoperability problems with unpatched versions, make getid() treat any non-ASCII as a legitimate identifier character (hence not requiring quotes), while making putid() treat any non-ASCII as requiring quoting. We could remove the resulting excess quoting once we feel that no unpatched servers remain in the wild, but that'll be years. A lesser problem is that getid() did the wrong thing with an input consisting of just two double quotes (""). That has to represent an empty string, but getid() read it as a single double quote instead. The case cannot arise in the normal course of events, since we don't allow empty-string role names. But let's fix it while we're here. Although we've not heard field reports of problems with non-ASCII role names, there's clearly a hazard there, so back-patch to all supported versions. Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3792884.1751492172@sss.pgh.pa.us Backpatch-through: 13
2025-07-09Change wchar2char() and char2wchar() to accept a locale_t.Jeff Davis
These are libc-specific functions, so should require a locale_t rather than a pg_locale_t (which could use another provider). Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel%40j-davis.com
2025-07-08Fix low-probability memory leak in XMLSERIALIZE(... INDENT).Tom Lane
xmltotext_with_options() did not consider the possibility that pg_xml_init() could fail --- most likely due to OOM. If that happened, the already-parsed xmlDoc structure would be leaked. Oversight in commit 483bdb2af. Bug: #18981 Author: Dmitry Kovalenko <d.kovalenko@postgrespro.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18981-9bc3c80f107ae925@postgresql.org Backpatch-through: 16
2025-07-07Standardize LSN formatting by zero paddingÁlvaro Herrera
This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2025-07-03Break out xxx2yyy_opt_overflow APIs for more datetime conversions.Tom Lane
Previous commits invented timestamp2timestamptz_opt_overflow, date2timestamp_opt_overflow, and date2timestamptz_opt_overflow functions to perform non-error-throwing conversions between datetime types. This patch completes the set by adding timestamp2date_opt_overflow, timestamptz2date_opt_overflow, and timestamptz2timestamp_opt_overflow. In addition, adjust timestamp2timestamptz_opt_overflow so that it doesn't throw error if timestamp2tm fails, but treats that as an overflow case. The situation probably can't arise except with an invalid timestamp value, and I can't think of a way that that would happen except data corruption. However, it's pretty silly to have a function whose entire reason for existence is to not throw errors for out-of-range inputs nonetheless throw an error for out-of-range input. The new APIs are not used in this patch, but will be needed in upcoming btree_gin changes. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://postgr.es/m/262624.1738460652@sss.pgh.pa.us
2025-07-02Allow width_bucket()'s "operand" input to be NaN.Tom Lane
The array-based variant of width_bucket() has always accepted NaN inputs, treating them as equal but larger than any non-NaN, as we do in ordinary comparisons. But up to now, the four-argument variants threw errors for a NaN operand. This is inconsistent and unnecessary, since we can perfectly well regard NaN as falling after the last bucket. We do still throw error for NaN or infinity histogram-bound inputs, since there's no way to compute sensible bucket boundaries. Arguably this is a bug fix, but given the lack of field complaints I'm content to fix it in master. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/2822872.1750540911@sss.pgh.pa.us
2025-07-02Move code for the bytea data type from varlena.c to new bytea.cMichael Paquier
This commit moves all the routines related to the bytea data type into its own new file, called bytea.c, clearing some of the bloat in varlena.c. This includes the routines for: - Input, output, receive and send - Comparison - Casts to integer types - bytea-specific functions The internals of the routines moved here are unchanged, with one exception. This comes with a twist in bytea_string_agg_transfn(), where the call to makeStringAggState() is replaced by the internals of this routine, still located in varlena.c. This simplifies the move to the new file by not having to expose makeStringAggState(). Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAJ7c6TMPVPJ5DL447zDz5ydctB8OmuviURtSwd=PHCRFEPDEAQ@mail.gmail.com
2025-07-01Remove provider field from pg_locale_t.Jeff Davis
The behavior of pg_locale_t is specified by methods, so a separate provider field is no longer necessary. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
2025-07-01Control ctype behavior internally with a method table.Jeff Davis
Previously, pattern matching and case mapping behavior branched based on the provider. Refactor to use a method table, which is less error-prone. This is also a step toward multiple provider versions, which we may want to support in the future. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
2025-07-01Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.Jeff Davis
Avoids unnecessary dependence on setlocale(). No behavior change. This commit reverts e1458f2f1b, which reverted some changes unintentionally committed before the branch for 19. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/a8666c391dfcabe79868d95f7160eac533ace718.camel@j-davis.com Discussion: https://postgr.es/m/7efaaa645aa5df3771bb47b9c35df27e08f3520e.camel@j-davis.com
2025-07-01Improve error handling of libxml2 calls in xml.cMichael Paquier
This commit fixes some defects in the backend's xml.c, found upon inspection of the internals of libxml2: - xmlEncodeSpecialChars() can fail on malloc(), returning NULL back to the caller. xmltext() assumed that this could never happen. Like other code paths, a TRY/CATCH block is added there, covering also the fact that cstring_to_text_with_len() could fail a memory allocation, where the backend would miss to free the buffer allocated by xmlEncodeSpecialChars(). - Some libxml2 routines called in xmlelement() can return NULL, like xmlAddChildList() or xmlTextWriterStartElement(). Dedicated errors are added for them. - xml_xmlnodetoxmltype() missed that xmlXPathCastNodeToString() can fail on an allocation failure. In this case, the call can just be moved to the existing TRY/CATCH block. All these code paths would cause the server to crash. As this is unlikely a problem in practice, no backpatch is done. Jim and I have caught these defects, not sure who has scored the most. The contrib module xml2/ has similar defects, which will be addressed in a separate change. Reported-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/aEEingzOta_S_Nu7@paquier.xyz
2025-06-30Add new OID alias type regdatabase.Nathan Bossart
This provides a convenient way to look up a database's OID. For example, the query SELECT * FROM pg_shdepend WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database()); can now be simplified to SELECT * FROM pg_shdepend WHERE dbid = current_database()::regdatabase; Like the regrole type, regdatabase has cluster-wide scope, so we disallow regdatabase constants from appearing in stored expressions. Bumps catversion. Author: Ian Lawrence Barwick <barwick@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/aBpjJhyHpM2LYcG0%40nathan
2025-06-30Remove unused #include's in src/backend/utils/adt/*Peter Eisentraut
Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAJ7c6TOowVbR-0NEvvDm6a_mag18krR0XJ2FKrc9DHXj7hFRtQ%40mail.gmail.com
2025-06-28Message style improvementsPeter Eisentraut
2025-06-21Doc: improve documentation about width_bucket().Tom Lane
Specify whether the bucket bounds are inclusive or exclusive, and improve some other vague language. Explain the behavior that occurs when the "low" bound is greater than the "high" bound. Make width_bucket_numeric's comment more like that for width_bucket_float8, in particular noting that infinite bounds are rejected (since they became possible in v14). Reported-by: Ben Peachey Higdon <bpeacheyhigdon@gmail.com> Author: Robert Treat <rob@xzilla.net> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/2BD74F86-5B89-4AC1-8F13-23CED3546AC1@gmail.com Backpatch-through: 13
2025-06-15Sync typedefs.list with the buildfarm.Tom Lane
Our maintenance of typedefs.list has been a little haphazard (and apparently we can't alphabetize worth a darn). Replace the file with the authoritative list from our buildfarm, and run pgindent using that. I also updated the additions/exclusions lists in pgindent where necessary to keep pgindent from messing things up significantly. Notably, now that regex_t and some related names are macros not real typedefs, we have to whitelist them explicitly. The exclusions list has also drifted noticeably, presumably due to changes of system headers on the buildfarm animals that contribute to the list. Unlike in prior years, I've not manually added typedef names that are missing from the buildfarm's list because they are not used to declare any variables or fields. So there are a few places where the typedef declaration itself is formatted worse than before, e.g. typedef enum IoMethod. I could preserve the names that were manually added to the list previously, but I'd really prefer to find a less manual way of dealing with these cases. A quick grep finds about 75 such symbols, most of which have never gotten any special treatment. Per discussion among pgsql-release, doing this now seems appropriate even though we're still a week or two away from making the v18 branch.
2025-06-11Revert a few small patches that were intended for version 19.Jeff Davis
- 4c787a24e7e220a60022e47c1776f22f72902899 - 78bd364ee39ca70a8f9cb8719282389866a08e14 - 7a6880fadc177873d5663961ec3a02d67e34dcbe - 8898082a5d3e94eef073f0e08124137e096e78ef Suggested-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA+TgmoZ=J=PVNZUNKaxULu+KUVSt3Y-aJ1DZ9Y3Co6mu0z62jA@mail.gmail.com Discussion: https://postgr.es/m/60e8c6d0a6c08e67f15dbbe9e53df0119c710065.camel@j-davis.com
2025-06-10inet_net_pton.c: use pg_ascii_tolower() rather than tolower().Jeff Davis
Avoid dependence on setlocale(). No behavior change. Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
2025-06-03Fix incorrect format placeholdersPeter Eisentraut
2025-05-30Change internal queryid type from uint64 to int64David Rowley
uint64 was perhaps chosen in cff440d36 as the type was uint32 prior to that widening work. Having this as uint64 doesn't make much sense and just adds the overhead of having to remember that we always output this in its signed form. Let's remove that overhead. The signed form output is seemingly required since we have no way to represent the full range of uint64 in an SQL type. We use BIGINT in places like pg_stat_statements, which maps directly to int64. The release notes "Source Code" section may want to mention this adjustment as some extensions may wish to adjust their code. Author: David Rowley <dgrowleyml@gmail.com> Suggested-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/50cb0c8b-994b-48f9-a1c4-13039eb3536b@eisentraut.org
2025-05-28Tighten parsing of datetime input.Tom Lane
ParseFraction only expects to deal with fields that contain a decimal point and digit(s). However it's possible in some edge cases for it to be passed input that doesn't look like that. In particular the input could look like a valid floating-point number, such as ".123e6". strtod() will happily eat that, possibly producing a result that is not within the expected range 0..1, which can result in integer overflow in the callers. That doesn't have any security consequences, but it's still not very desirable. Fix by checking that the input has the expected form. Similarly, DecodeNumberField only expects to deal with fields that contain a decimal point and digit(s), but it's sometimes abused to parse strings that might not look like that. This could result in failure to reject bogus input, yielding silly results. Again, fix by rejecting input that doesn't look as-expected. That decision also means that we can affirmatively answer the very old comment questioning whether we couldn't save some duplicative code by using ParseFractionalSecond here. While these changes should only reject input that nobody would consider valid, it still doesn't seem like a change to make in stable branches. Apply to HEAD only. Reported-by: Evgeniy Gorbanev <gorbanev.es@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1328335.1748371099@sss.pgh.pa.us
2025-05-28Fix conversion of SIMILAR TO regexes for character classesMichael Paquier
The code that translates SIMILAR TO pattern matching expressions to POSIX-style regular expressions did not consider that square brackets can be nested. For example, in an expression like [[:alpha:]%_], the logic replaced the placeholders '_' and '%' but it should not. This commit fixes the conversion logic by tracking the nesting level of square brackets marking character class areas, while considering that in expressions like []] or [^]] the first closing square bracket is a regular character. Multiple tests are added to show how the conversions should or should not apply applied while in a character class area, with specific cases added for all the characters converted outside character classes like an opening parenthesis '(', dollar sign '$', etc. Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/16ab039d1af455652bdf4173402ddda145f2c73b.camel@cybertec.at Backpatch-through: 13
2025-05-23Revert function to get memory context stats for processesDaniel Gustafsson
Due to concerns raised about the approach, and memory leaks found in sensitive contexts the functionality is reverted. This reverts commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291 for v18 with an intent to revisit this patch for v19. Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
2025-05-22Fix memory leak in XMLSERIALIZE(... INDENT).Tom Lane
xmltotext_with_options sometimes tries to replace the existing root node of a libxml2 document. In that case xmlDocSetRootElement will unlink and return the old root node; if we fail to free it, it's leaked for the remainder of the session. The amount of memory at stake is not large, a couple hundred bytes per occurrence, but that could still become annoying in heavy usage. Our only other xmlDocSetRootElement call is not at risk because it's working on a just-created document, but let's modify that code too to make it clear that it's dependent on that. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/1358967.1747858817@sss.pgh.pa.us Backpatch-through: 16
2025-05-19Fix deparsing FETCH FIRST <expr> ROWS WITH TIESHeikki Linnakangas
In the grammar, <expr> is a c_expr, which accepts only a limited set of integer literals and simple expressions without parens. The deparsing logic didn't quite match the grammar rule, and failed to use parens e.g. for "5::bigint". To fix, always surround the expression with parens. Would be nice to omit the parens in simple cases, but unfortunately it's non-trivial to detect such simple cases. Even if the expression is a simple literal 123 in the original query, after parse analysis it becomes a FuncExpr with COERCE_IMPLICIT_CAST rather than a simple Const. Reported-by: yonghao lee Backpatch-through: 13 Discussion: https://www.postgresql.org/message-id/18929-077d6b7093b176e2@postgresql.org
2025-05-11Fix comment of tsquerysend()Álvaro Herrera
The comment describes the order in which fields are sent, and it had one of the fields in the wrong place. This has been wrong since e6dbcb72fafa (2008), so backpatch all the way back. Author: Emre Hasegeli <emre@hasegeli.com> Discussion: https://postgr.es/m/CAE2gYzzf38bR_R=izhpMxAmqHXKeM5ajkmukh4mNs_oXfxcMCA@mail.gmail.com
2025-05-11Sort includes in alphabetical orderÁlvaro Herrera
Added by commit 042a66291b04, no backpatch needed.
2025-04-30Fix broken indentationDavid Rowley
I forgot to run pgindent in d8555e522. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/156083c9-eac0-418d-9667-92dec4d6d6cd@oss.nttdata.com
2025-04-30Fix a couple of comment typosDavid Rowley
Author: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3+MRwDKc4YSFKKPKq7Y+vMufVC5u94wM5KZPB2CbgCxnQ@mail.gmail.com
2025-04-23Properly prepare varinfos in estimate_multivariate_bucketsize()Alexander Korotkov
To estimate with extended statistics, we need to clear the varnullingrels field in the expression, and duplicates are not allowed in the GroupVarInfo list. We might re-use add_unique_group_var(), but we don't do so for two reasons. 1) We must keep the origin_rinfos list ordered exactly the same way as varinfos. 2) add_unique_group_var() is designed for estimate_num_groups(), where a larger number of groups is worse. While estimating the number of hash buckets, we have the opposite: a lesser number of groups is worse. Therefore, we don't have to remove "known equal" vars: the removed var may valuably contribute to the multivariate statistics to grow the number of groups. This commit adds custom code to estimate_multivariate_bucketsize() to initialize varinfos properly. Reported-by: Robins Tharakan <tharakan@gmail.com> Discussion: https://postgr.es/m/18885-da51324078588253%40postgresql.org Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-04-21Fix INITCAP() word boundaries for PG_UNICODE_FAST.Jeff Davis
Word boundaries are based on whether a character is alphanumeric or not. For the PG_UNICODE_FAST collation, alphanumeric includes non-ASCII digits; whereas for the PG_C_UTF8 collation, it only includes digits 0-9. Pass down the right information from the pg_locale_t into initcap_wbnext to differentiate the behavior. Reported-by: Noah Misch <noah@leadboat.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
2025-04-21Fix a few more duplicate words in commentsDavid Rowley
Similar to 84fd3bc14 but these ones were found using a regex that can span multiple lines. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-21Fix a few duplicate words in commentsDavid Rowley
These are all new to v18 Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-19Fix typos and grammar in the codeMichael Paquier
The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-17Assert lack of hazardous buffer locks before possible catalog read.Noah Misch
Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind, which existed in all branches for six days before detection. While the probability of reaching the trouble was low, the disruption was extreme. No new backends could start, and service restoration needed an immediate shutdown. Hence, add this to catch the next bug like it. The new check in RelationIdGetRelation() suffices to make autovacuum detect the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit 0bada39. This also checks in a number of similar places. It replaces each Assert(IsTransactionState()) that pertained to a conditional catalog read. No back-patch for now, but a back-patch of commit 243e9b4 should back-patch this, too. A back-patch could omit the src/test/regress changes, since back branches won't gain new index columns. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2025-04-17Improve comments for estimate_multivariate_ndistinct()David Rowley
estimate_multivariate_ndistinct() is coded to assume the caller handles passing it a list of GroupVarInfos with unique 'var' fields over the entire list. 6bb6a62f3 added code which didn't ensure this and that could result in estimate_multivariate_ndistinct() erroring out with: ERROR: corrupt MVNDistinct entry This occurred because estimate_multivariate_ndistinct() first searches for a set of stats that match to at least two of the given GroupVarInfos and then later assumes that the MVNDistinctItem.items array of the best matching stats will have an entry for those two columns. If the GroupVarInfos List contained a duplicate entry then the same column could be matched to twice and that could trick the code into thinking we have >= 2 columns matched in cases where only a single distinct column has been matched. This could result in a failure to find the correct MVNDistinctItem in the stats as the array containing those never contains an item for single columns. Here we make it more clear that the function needs a distinct set of GroupVarInfos and also tidy up a few other comments to make things a bit easier to follow. Author: David Rowley <drowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com