summaryrefslogtreecommitdiff
path: root/contrib
AgeCommit message (Collapse)Author
9 hoursbufmgr: Allow some buffer state modifications while holding header lockAndres Freund
Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in 48354581a49c. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in 5e899859287). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
11 hoursUse XLogRecPtrIsValid() in various placesÁlvaro Herrera
Now that commit 06edbed47862 has introduced XLogRecPtrIsValid(), we can use that instead of: - XLogRecPtrIsInvalid() - direct comparisons with InvalidXLogRecPtr - direct comparisons with literal 0 This makes the code more consistent. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
28 hourspostgres_fdw: Add more test coverage for EvalPlanQual testing.Etsuro Fujita
postgres_fdw supports EvalPlanQual testing by using the infrastructure provided by the core with the RecheckForeignScan callback routine (cf. commits 5fc4c26db and 385f337c9), but there has been no test coverage for that, except that recent commit 12609fbac, which fixed an issue in commit 385f337c9, added a test case to exercise only a code path added by that commit to the core infrastructure. So let's add test cases to exercise other code paths as well at this time. Like commit 12609fbac, back-patch to all supported branches. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Author: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAPmGK15%2B6H%3DkDA%3D-y3Y28OAPY7fbAdyMosVofZZ%2BNc769epVTQ%40mail.gmail.com Backpatch-through: 13
29 hoursUse stack allocated StringInfoDatas, where possibleDavid Rowley
Various places that were using StringInfo but didn't need that StringInfo to exist beyond the scope of the function were using makeStringInfo(), which allocates both a StringInfoData and the buffer it uses as two separate allocations. It's more efficient for these cases to use a StringInfoData on the stack and initialize it with initStringInfo(), which only allocates the string buffer. This also simplifies the cleanup, in a few cases. Author: Mats Kindahl <mats.kindahl@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/4379aac8-26f1-42f2-a356-ff0e886228d3@gmail.com
6 daysFix contrib/ltree's subpath() with negative offset.Tom Lane
subpath(ltree,offset,len) now correctly errors when given an offset less than -n, where n is the number of labels in the given ltree. There was a duplicate block of code that allowed an offset as low as -2n. The documentation says no such thing, so this must have been a copy-and-paste error in the original ltree patch. While here, avoid redundant calculation of "end" and write LTREE_MAX_LEVELS rather than its hard-coded value. Author: Marcus Gartner <m.a.gartner@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAAUGV_SvBO9gWYbaejb9nhe-mS9FkNP4QADNTdM3wdRhvLobwA@mail.gmail.com
8 daysFix some confusing uses of constPeter Eisentraut
There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData *Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData *x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
9 dayspg_stat_statements: Fix handling of duplicate constant locationsÁlvaro Herrera
Two or more constants can have the same location. We handled this correctly for non squashed constants, but failed to do it if squashed (resulting in out-of-bounds memory access), because the code structure became broken by commit 0f65f3eec478: we failed to update 'last_loc' correctly when skipping these squashed constants. The simplest fix seems to be to get rid of 'last_loc' altogether -- in hindsight, it's quite pointless. Also, when ignoring a constant because of this, make sure to fulfill fill_in_constant_lengths's duty of setting its length to -1. Lastly, we can use == instead of <= because the locations have been sorted beforehand, so the < case cannot arise. Co-authored-by: Sami Imseih <samimseih@gmail.com> Co-authored-by: Dmitry Dolgov <9erthalion6@gmail.com> Reported-by: Konstantin Knizhnik <knizhnik@garret.ru> Backpatch-through: 18 Discussion: https://www.postgresql.org/message-id/2b91e358-0d99-43f7-be44-d2d4dbce37b3%40garret.ru
2025-10-23Update expected output for contrib/sepgsql's regression tests.Tom Lane
Commit 65281391a caused some additional error context lines to appear in the output of one test case. That's fine, but we missed updating the expected output. Do it now. While here, add some missing test-output subdirectories to contrib/sepgsql/.gitignore, so that we don't get git warnings after running the tests. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1613232.1761255361@sss.pgh.pa.us Backpatch-through: 18
2025-10-22Use CompactAttribute more often, when possibleDavid Rowley
5983a4cff added CompactAttribute for storing commonly used fields from FormData_pg_attribute. 5983a4cff didn't go to the trouble of adjusting every location where we can use CompactAttribute rather than FormData_pg_attribute, so here we change the remaining ones. There are some locations where I've left the code using FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using CompactAttribute here seems more risky as often the TupleDesc is being changed and those changes may not have been flushed to the CompactAttribute yet. I've also left record_recv(), record_send(), record_cmp(), record_eq() and record_image_eq() alone as it's not clear to me that accessing the CompactAttribute is a win here due to the FormData_pg_attribute still having to be accessed for most cases. Switching the relevant parts to use CompactAttribute would result in having to access both for common cases. Careful benchmarking may reveal that something can be done to make this better, but in absence of that, the safer option is to leave these alone. In ReorderBufferToastReplace(), there was a check to skip attnums < 0 while looping over the TupleDesc. Doing this is redundant since TupleDescs don't store < 0 attnums. Removing that code allows us to move to using CompactAttribute. The change in validateDomainCheckConstraint() just moves fetching the FormData_pg_attribute into the ERROR path, which is cold due to calling errstart_cold() and results in code being moved out of the common path. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
2025-10-20Support COPY TO for partitioned tables.Masahiko Sawada
Previously, COPY TO command didn't support directly specifying partitioned tables so users had to use COPY (SELECT ...) TO variant. This commit adds direct COPY TO support for partitioned tables, improving both usability and performance. Performance tests show it's faster than the COPY (SELECT ...) TO variant as it avoids the overheads of query processing and sending results to the COPY TO command. When used with partitioned tables, COPY TO copies the same rows as SELECT * FROM table. Row-level security policies of the partitioned table are applied in the same way as when executing COPY TO on a plain table. Author: jian he <jian.universality@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CACJufxEZt%2BG19Ors3bQUq-42-61__C%3Dy5k2wk%3DsHEFRusu7%3DiQ%40mail.gmail.com
2025-10-18Allow role created by new test to log in on Windows.Tom Lane
We must tell init about each role name we plan to connect as, else SSPI auth fails. Similar to previous patches such as 14793f471, 973542866. Oversight in 208927e65, per buildfarm member drongo. (Although that was back-patched to v13, the test script only exists in v16 and up.)
2025-10-17Fix privilege checks for pg_prewarm() on indexes.Nathan Bossart
pg_prewarm() currently checks for SELECT privileges on the target relation. However, indexes do not have access rights of their own, so a role may be denied permission to prewarm an index despite having the SELECT privilege on its parent table. This commit fixes this by locking the parent table before the index (to avoid deadlocks) and checking for SELECT on the parent table. Note that the code is largely borrowed from amcheck_lock_relation_and_check(). An obvious downside of this change is the extra AccessShareLock on the parent table during prewarming, but that isn't expected to cause too much trouble in practice. Author: Ayush Vatsa <ayushvatsa1810@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CACX%2BKaMz2ZoOojh0nQ6QNBYx8Ak1Dkoko%3DD4FSb80BYW%2Bo8CHQ%40mail.gmail.com Backpatch-through: 13
2025-10-17Improve TAP tests by replacing ok() with better Test::More functionsMichael Paquier
The TAP tests whose ok() calls are changed in this commit were relying on perl operators, rather than equivalents available in Test::More. For example, rather than the following: ok($data =~ qr/expr/m, "expr matching"); ok($data !~ qr/expr/m, "expr not matching"); The new test code uses this equivalent: like($data, qr/expr/m, "expr matching"); unlike($data, qr/expr/m, "expr not matching"); A huge benefit of the new formulation is that it is possible to know about the values we are checking if a failure happens, making debugging easier, should the test runs happen in the buildfarm, in the CI or locally. This change leads to more test code overall as perltidy likes to make the code pretty the way it is in this commit. Author: Sadhuprasad Patro <b.sadhu@gmail.com> Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
2025-10-15Fix EvalPlanQual handling of foreign/custom joins in ExecScanFetch.Etsuro Fujita
If inside an EPQ recheck, ExecScanFetch would run the recheck method function for foreign/custom joins even if they aren't descendant nodes in the EPQ recheck plan tree, which is problematic at least in the foreign-join case, because such a foreign join isn't guaranteed to have an alternative local-join plan required for running the recheck method function; in the postgres_fdw case this could lead to a segmentation fault or an assert failure in an assert-enabled build when running the recheck method function. Even if inside an EPQ recheck, any scan nodes that aren't descendant ones in the EPQ recheck plan tree should be normally processed by using the access method function; fix by modifying ExecScanFetch so that if inside an EPQ recheck, it runs the recheck method function for foreign/custom joins that are descendant nodes in the EPQ recheck plan tree as before and runs the access method function for foreign/custom joins that aren't. This fix also adds to postgres_fdw an isolation test for an EPQ recheck that caused issues stated above. Oversight in commit 385f337c9. Reported-by: Kristian Lejao <kristianlejao@gmail.com> Author: Masahiko Sawada <sawada.mshk@gmail.com> Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com Backpatch-through: 13
2025-10-14dblink: Avoid locking relation before privilege check.Nathan Bossart
The present coding of dblink's get_rel_from_relname() predates the introduction of RangeVarGetRelidExtended(), which provides a way to check permissions before locking the relation. This commit adjusts get_rel_from_relname() to use that function. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/aOgmi6avE6qMw_6t%40nathan
2025-10-08Add mem_exceeded_count column to pg_stat_replication_slots.Masahiko Sawada
This commit introduces a new column mem_exceeded_count to the pg_stat_replication_slots view. This counter tracks how often the memory used by logical decoding exceeds the logical_decoding_work_mem limit. The new statistic helps users determine whether exceeding the logical_decoding_work_mem limit is a rare occurrences or a frequent issue, information that wasn't available through existing statistics. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
2025-10-08Add ExplainState argument to pg_plan_query() and planner().Robert Haas
This allows extensions to have access to any data they've stored in the ExplainState during planning. Unfortunately, it won't help with EXPLAIN EXECUTE is used, but since that case is less common, this still seems like an improvement. Since planner() has quite a few arguments now, also add some documentation of those arguments and the return value. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
2025-10-08Implement Eager AggregationRichard Guo
Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
2025-10-07Assign each subquery a unique name prior to planning it.Robert Haas
Previously, subqueries were given names only after they were planned, which makes it difficult to use information from a previous execution of the query to guide future planning. If, for example, you knew something about how you want "InitPlan 2" to be planned, you won't know whether the subquery you're currently planning will end up being "InitPlan 2" until after you've finished planning it, by which point it's too late to use the information that you had. To fix this, assign each subplan a unique name before we begin planning it. To improve consistency, use textual names for all subplans, rather than, as we did previously, a mix of numbers (such as "InitPlan 1") and names (such as "CTE foo"), and make sure that the same name is never assigned more than once. We adopt the somewhat arbitrary convention of using the type of sublink to set the plan name; for example, a query that previously had two expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up named expr_1 and expr_2. Because names are assigned before rather than after planning, some of the regression test outputs show the numerical part of the name switching positions: what was previously SubPlan 2 was actually the first one encountered, but we finished planning it later. We assign names even to subqueries that aren't shown as such within the EXPLAIN output. These include subqueries that are a FROM clause item or a branch of a set operation, rather than something that will be turned into an InitPlan or SubPlan. The purpose of this is to make sure that, below the topmost query level, there's always a name for each subquery that is stable from one planning cycle to the next (assuming no changes to the query or the database schema). Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
2025-10-05Don't include access/htup_details.h in executor/tuptable.hÁlvaro Herrera
This is not at all needed; I suspect it was a simple mistake in commit 5408e233f066. It causes htup_details.h to bleed into a huge number of places via execnodes.h. Remove it and fix fallout. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
2025-10-02pgstattuple: Improve reports generated for indexes (hash, gist, btree)Michael Paquier
pgstattuple checks the state of the pages retrieved for gist and hash using some check functions from each index AM, respectively gistcheckpage() and _hash_checkpage(). When these are called, they would fail when bumping on data that is found as incorrect (like opaque area size not matching, or empty pages), contrary to btree that simply discards these cases and continues to aggregate data. Zero pages can happen after a crash, with these AMs being able to do an internal cleanup when these are seen. Also, sporadic failures are annoying when doing for example a large-scale diagnostic query based on pgstattuple with a join of pg_class, as it forces one to use tricks like quals to discard hash or gist indexes, or use a PL wrapper able to catch errors. This commit changes the reports generated for btree, gist and hash to be more user-friendly; - When seeing an empty page, report it as free space. This new rule applies to gist and hash, and already applied to btree. - For btree, a check based on the size of BTPageOpaqueData is added. - For gist indexes, gistcheckpage() is not called anymore, replaced by a check based on the size of GISTPageOpaqueData. - For hash indexes, instead of _hash_getbuf_with_strategy(), use a direct call to ReadBufferExtended(), coupled with a check based on HashPageOpaqueData. The opaque area size check was already used. - Pages that do not match these criterias are discarded from the stats reports generated. There have been a couple of bug reports over the years that complained about the current behavior for hash and gist, as being not that useful, with nothing being done about it. Hence this change is backpatched down to v13. Reported-by: Noah Misch <noah@leadboat.com> Author: Nitin Motiani <nitinmotiani@google.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CAH5HC95gT1J3dRYK4qEnaywG8RqjbwDdt04wuj8p39R=HukayA@mail.gmail.com Backpatch-through: 13
2025-09-30Revert "Make some use of anonymous unions [pgcrypto]"Peter Eisentraut
This reverts commit efcd5199d8cb8e5098f79b38d0c46004e69d1a46. I rebased my patch series incorrectly. This patch contained unrelated parts from another patch, which made the overall build fail. Revert for now and reconsider.
2025-09-30Make some use of anonymous unions [pgcrypto]Peter Eisentraut
Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes some structures in pgcrypto. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30Make some use of anonymous unions [reorderbuffer xact_time]Peter Eisentraut
Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the ReorderBufferTXN struct. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f00a9968-388e-4f8c-b5ef-5102e962d997%40eisentraut.org
2025-09-30Do a tiny bit of header file maintenanceÁlvaro Herrera
Stop including utils/relcache.h in access/genam.h, and stop including htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and tidbitmap.h) are widely used in other header files, so it's in our best interest that they remain as lean as reasonable. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
2025-09-23Keep track of what RTIs a Result node is scanning.Robert Haas
Result nodes now include an RTI set, which is only non-NULL when they have no subplan, and is taken from the relid set of the RelOptInfo that the Result is generating. ExplainPreScanNode now takes notice of these RTIs, which means that a few things get schema-qualified in the regression tests that previously did not. This makes the output more consistent between cases where some part of the plan tree is replaced by a Result node and those where this does not happen. Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs stored in a Result node just as it already does for other RTI-bearing node types. Result nodes also now include a result_reason, which tells us something about why the Result node was inserted. Using that information, EXPLAIN now emits, where relevant, a "Replaces" line describing the origin of a Result node. The purpose of these changes is to allow code that inspects a Plan tree to understand the origin of Result nodes that appear therein. Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
2025-09-22Fix various incorrect filename referencesDavid Rowley
Author: Chao Li <li.evan.chao@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com
2025-09-20Track the maximum possible frequency of non-MCE array elements.Tom Lane
The lossy-counting algorithm that ANALYZE uses to identify most-common array elements has a notion of cutoff frequency: elements with frequency greater than that are guaranteed to be collected, elements with smaller frequencies are not. In cases where we find fewer MCEs than the stats target would permit us to store, the cutoff frequency provides valuable additional information, to wit that there are no non-MCEs with frequency greater than that. What the selectivity estimation functions actually use the "minfreq" entry for is as a ceiling on the possible frequency of non-MCEs, so using the cutoff rather than the lowest stored MCE frequency provides a tighter bound and more accurate estimates. Therefore, instead of redundantly storing the minimum observed MCE frequency, store the cutoff frequency when there are fewer tracked values than we want. (When there are more, then of course we cannot assert that no non-stored elements are above the cutoff frequency, since we're throwing away some that are; so we still use the minimum stored frequency in that case.) Notably, this works even when none of the values are common enough to be called MCEs. In such cases we previously stored nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the selectivity functions falling back to default estimates. So in that case we want to construct a STATISTIC_KIND_MCELEM entry that contains no "values" but does have "numbers", to wit the three extra numbers that the MCELEM entry type defines. A small obstacle is that update_attstats() has traditionally stored a null, not an empty array, when passed zero "values" for a slot. That gives rise to an MCELEM entry that get_attstatsslot() will spit up on. The least risky solution seems to be to adjust update_attstats() so that it will emit a non-null (but possibly empty) array when the passed stavalues array pointer isn't NULL, rather than conditioning that on numvalues > 0. In other existing cases I don't believe that that changes anything. For consistency, handle the stanumbers array the same way. In passing, improve the comments in routines that use STATISTIC_KIND_MCELEM data. Particularly, explain why we use minfreq / 2 not minfreq as the estimate for non-MCE values. Thanks to Matt Long for the suggestion that we could apply this idea even when there are more than zero MCEs. Reported-by: Mark Frost <FROSTMAR@uk.ibm.com> Reported-by: Matt Long <matt@mattlong.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-19Add optional pid parameter to pg_replication_origin_session_setup().Amit Kapila
Commit 216a784829c introduced parallel apply workers, allowing multiple processes to share a replication origin. To support this, replorigin_session_setup() was extended to accept a pid argument identifying the process using the origin. This commit exposes that capability through the SQL interface function pg_replication_origin_session_setup() by adding an optional pid parameter. This enables multiple processes to coordinate replication using the same origin when using SQL-level replication functions. This change allows the non-builtin logical replication solutions to implement parallel apply for large transactions. Additionally, an existing internal error was made user-facing, as it can now be triggered via the exposed SQL API. Author: Doruk Yilmaz <doruk@mixrank.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
2025-09-16Add regression expected-files for older OpenSSL in FIPS mode.Tom Lane
Cover contrib/pgcrypto, per buildfarm. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/443709.1757876535@sss.pgh.pa.us Backpatch-through: 17
2025-09-16Provide more-specific error details/hints for function lookup failures.Tom Lane
Up to now we've contented ourselves with a one-size-fits-all error hint when we fail to find any match to a function or procedure call. That was mostly okay in the beginning, but it was never great, and since the introduction of named arguments it's really not adequate. We at least ought to distinguish "function name doesn't exist" from "function name exists, but not with those argument names". And the rules for named-argument matching are arcane enough that some more detail seems warranted if we match the argument names but the call still doesn't work. This patch creates a framework for dealing with these problems: FuncnameGetCandidates and related code will now pass back a bitmask of flags showing how far the match succeeded. This allows a considerable amount of granularity in the reports. The set-bits-in-a-bitmask approach means that when there are multiple candidate functions, the report will reflect the match(es) that got the furthest, which seems correct. Also, we can avoid mentioning "maybe add casts" unless failure to match argument types is actually the issue. Extend the same return-a-bitmask approach to OpernameGetCandidates. The issues around argument names don't apply to operator syntax, but it still seems worth distinguishing between "there is no operator of that name" and "we couldn't match the argument types". While at it, adjust these messages and related ones to more strictly separate "detail" from "hint", following our message style guidelines' distinction between those. Reported-by: Dominique Devienne <ddevienne@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us
2025-09-12Allow redeclaration of typedef yyscan_tPeter Eisentraut
This is allowed in C11, so we don't need the workaround guards against it anymore. This effectively reverts commit 382092a0cd2 that put these guards in place. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/10d32190-f31b-40a5-b177-11db55597355@eisentraut.org
2025-09-12Remove whitespace in comment of pg_stat_statements.cMichael Paquier
Introduced by 6b4d23feef6e, spotted while reading this area of the code.
2025-09-08Don't generate fake "*SELECT*" or "*SELECT* %d" subquery aliases.Robert Haas
rte->alias should point only to a user-written alias, but in these cases that principle was violated. Fixing this causes some regression test output changes: wherever rte->alias previously had a value and is now NULL, rte->eref is now set to a generated name rather than to rte->alias; and the scheme used to generate eref names differs from what we were doing for aliases. The upshot is that instead of "*SELECT*" or "*SELECT* %d", EXPLAIN will now emit "unnamed_subquery" or "unnamed_subquery_%d". But that's a reasonable descriptor, and we were already producing that in yet other cases, so this seems not too objectionable. Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Robert Haas <rhaas@postgresql.org> Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com
2025-09-05bufmgr: Remove freelist, always use clock-sweepAndres Freund
This set of changes removes the list of available buffers and instead simply uses the clock-sweep algorithm to find and return an available buffer. This also removes the have_free_buffer() function and simply caps the pg_autoprewarm process to at most NBuffers. While on the surface this appears to be removing an optimization it is in fact eliminating code that induces overhead in the form of synchronization that is problematic for multi-core systems. The main reason for removing the freelist, however, is not the moderate improvement in scalability, but that having the freelist would require dedicated complexity in several upcoming patches. As we have not been able to find a case benefiting from the freelist... Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
2025-09-03Move dynamically-allocated LWLock tranche names to shared memory.Nathan Bossart
There are two ways for shared libraries to allocate their own LWLock tranches. One way is to call RequestNamedLWLockTranche() in a shmem_request_hook, which requires the library to be loaded via shared_preload_libraries. The other way is to call LWLockNewTrancheId(), which is not subject to the same restrictions. However, LWLockNewTrancheId() does require each backend to store the tranche's name in backend-local memory via LWLockRegisterTranche(). This API is a little cumbersome and leads to things like unhelpful pg_stat_activity.wait_event values in backends that haven't loaded the library. This commit moves these LWLock tranche names to shared memory, thus eliminating the need for each backend to call LWLockRegisterTranche(). Instead, the tranche name must be provided to LWLockNewTrancheId(), which immediately makes the name available to all backends. Since the tranche name array is append-only, lookups can ordinarily avoid locking as long as their local copy of the LWLock counter is greater than the requested tranche ID. One downside of this approach is that we now have a hard limit on both the length of tranche names (NAMEDATALEN-1 bytes) and the number of dynamically-allocated tranches (256). Besides a limit of NAMEDATALEN-1 bytes for tranche names registered via RequestNamedLWLockTranche(), no such limits previously existed. We could avoid these new limits by using dynamic shared memory, but the complexity involved didn't seem worth it. We briefly considered making the tranche limit user-configurable but ultimately decided against that, too. Since there is still a lot of time left in the v19 development cycle, it's possible we will revisit this choice. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAA5RZ0vvED3naph8My8Szv6DL4AxOVK3eTPS0qXsaKi%3DbVdW2A%40mail.gmail.com
2025-09-01postgres_fdw: Use psql variables for connection parametersMichael Paquier
Several statements need to reference the current connection's current database name and current port value. Until now, this has been accomplished by creating dynamic SQL statements inside of a DO block, which is not as easy to parse. It also takes away some of the granularity of any error messages that might occur, making debugging harder. By capturing the connection-specific settings into psql variables, it becomes possible to write simpler SQL statements for the FDW objects. This eliminates most of DO blocks used in this test, making it a bit more readable and shorter. Author: Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=cpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR+gJX-Og@mail.gmail.com
2025-08-29Remove unneeded casts of BufferGetPage() resultPeter Eisentraut
BufferGetPage() already returns type Page, so casting it to Page doesn't achieve anything. A sizable number of call sites does this casting; remove that. This was already done inconsistently in the code in the first import in 1996 (but didn't exist in the pre-1995 code), and it was then apparently just copied around. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/CALdSSPgFhc5=vLqHdk-zCcnztC0zEY3EU_Q6a9vPEaw7FkE9Vw@mail.gmail.com
2025-08-28Avoid including commands/dbcommands.h in so many placesÁlvaro Herrera
This has been done historically because of get_database_name (which since commit cb98e6fb8fd4 belongs in lsyscache.c/h, so let's move it there) and get_database_oid (which is in the right place, but whose declaration should appear in pg_database.h rather than dbcommands.h). Clean this up. Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h since commit f1fd515b393a, so remove them. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
2025-08-26Put "excludeOnly" GIN scan keys at the end of the scankey array.Tom Lane
Commit 4b754d6c1 introduced the concept of an excludeOnly scan key, which cannot select matching index entries but can reject non-matching tuples, for example a tsquery such as '!term'. There are poorly-documented assumptions that such scan keys do not appear as the first scan key. ginNewScanKey did nothing to ensure that, however, with the result that certain GIN index searches could go into an infinite loop while apparently-equivalent queries with the clauses in a different order were fine. Fix by teaching ginNewScanKey to place all excludeOnly scan keys after all not-excludeOnly ones. So far as we know at present, it might be sufficient to avoid the case where the very first scan key is excludeOnly; but I'm not very convinced that there aren't other dependencies on the ordering. Bug: #19031 Reported-by: Tim Wood <washwithcare@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org Backpatch-through: 13
2025-08-22Change dynahash.c and hsearch.h to use int64 instead of longMichael Paquier
This code was relying on "long", which is signed 8 bytes everywhere except on Windows where it is 4 bytes, that could potentially expose it to overflows, even if the current uses in the code are fine as far as I know. This code is now able to rely on the same sizeof() variable everywhere, with int64. long was used for sizes, partition counts and entry counts. Some callers of the dynahash.c routines used long declarations, that can be cleaned up to use int64 instead. There was one shortcut based on SIZEOF_LONG, that can be removed. long is entirely removed from dynahash.c and hsearch.h. Similar work was done in b1e5c9fa9ac4. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aKQYp-bKTRtRauZ6@paquier.xyz
2025-08-19Add CHECK_FOR_INTERRUPTS in contrib/pg_buffercache functions.Masahiko Sawada
This commit adds CHECK_FOR_INTERRUPTS to loops iterating over shared buffers in several pg_buffercache functions, allowing them to be interrupted during long-running operations. Backpatch to all supported versions. Add CHECK_FOR_INTERRUPTS to the loop in pg_buffercache_pages() in all supported branches, and to pg_buffercache_summary() and pg_buffercache_usage_counts() in version 16 and newer. Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Discussion: https://postgr.es/m/CAHg+QDcejeLx7WunFT3DX6XKh1KshvGKa8F5au8xVhqVvvQPRw@mail.gmail.com Backpatch-through: 13
2025-08-14Avoid including tableam.h and xlogreader.h in nbtree.hÁlvaro Herrera
Doing that seems rather random and unnecessary. This commit removes those and fixes fallout, which is pretty minimal. We do need to add a forward declaration of struct TM_IndexDeleteOp (whose full definition appears in tableam.h) so that _bt_delitems_delete_check()'s declaration can use it. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/202508051109.lzk3lcuzsaxo@alvherre.pgsql
2025-08-13Grab the low-hanging fruit from forcing USE_FLOAT8_BYVAL to true.Tom Lane
Remove conditionally-compiled code for the other case. Replace uses of FLOAT8PASSBYVAL with constant "true", mainly because it was quite confusing in cases where the type we were dealing with wasn't float8. I left the associated pg_control and Pg_magic_struct fields in place. Perhaps we should get rid of them, but it would save little, so it doesn't seem worth thinking hard about the compatibility implications. I just labeled them "vestigial" in places where that seemed helpful. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
2025-08-13postgres_fdw: Fix tests with ANALYZE and remote samplingMichael Paquier
The tests fixed in this commit were changing the sampling setting of a foreign server, but then were analyzing a local table instead of a foreign table, meaning that the test was not running for its original purpose. This commit changes the ANALYZE commands to analyze the foreign table, and changes the foreign table definition to point to a valid remote table. Attempting to analyze the foreign table "analyze_ftable" would have failed before this commit, because "analyze_rtable1" is not defined on the remote side. Issue introduced by 8ad51b5f446b. Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=cpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR+gJX-Og@mail.gmail.com Backpatch-through: 16
2025-08-08Mop-up for Datum conversion cleanups.Tom Lane
Fix a couple more places where an explicit Datum conversion is needed (not clear how we missed these in ff89e182d and previous commits). Replace the minority usage "(Datum) NULL" with "(Datum) 0". The former depends on the assumption that Datum is the same width as Pointer, the latter doesn't. Anyway consistency is a good thing. This is, I believe, the last of the notational mop-up needed before we can consider changing Datum to uint64 everywhere. It's also important cleanup for more aggressive ideas such as making Datum a struct. Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us Discussion: https://postgr.es/m/8246d7ff-f4b7-4363-913e-827dadfeb145@eisentraut.org
2025-08-08Add missing Datum conversionsPeter Eisentraut
Add various missing conversions from and to Datum. The previous code mostly relied on implicit conversions or its own explicit casts instead of using the correct DatumGet*() or *GetDatum() functions. We think these omissions are harmless. Some actual bugs that were discovered during this process have been committed separately (80c758a2e1d, fd2ab03fea2). Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-08postgres_fdw and dblink should check if backend has MyProcPortPeter Eisentraut
before checking ->has_scram_keys. MyProcPort is NULL in background workers. So this could crash for example if a background worker accessed a suitable configured foreign table. Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/27b29a35-9b96-46a9-bc1a-914140869dac%40gmail.com
2025-08-08Fix incorrect lack of Datum conversion in _int_matchsel()Peter Eisentraut
The code used return (Selectivity) 0.0; where PG_RETURN_FLOAT8(0.0); would be correct. On 64-bit systems, these are pretty much equivalent, but on 32-bit systems, PG_RETURN_FLOAT8() correctly produces a pointer, but the old wrong code would return a null pointer, possibly leading to a crash elsewhere. We think this code is actually not reachable because bqarr_in won't accept an empty query, and there is no other function that will create query_int values. But better be safe and not let such incorrect code lie around. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-08Disallow collecting transition tuples from child foreign tables.Etsuro Fujita
Commit 9e6104c66 disallowed transition tables on foreign tables, but failed to account for cases where a foreign table is a child table of a partitioned/inherited table on which transition tables exist, leading to incorrect transition tuples collected from such foreign tables for queries on the parent table triggering transition capture. This occurred not only for inherited UPDATE/DELETE but for partitioned INSERT later supported by commit 3d956d956, which should have handled it at least for the INSERT case, but didn't. To fix, modify ExecAR*Triggers to throw an error if the given relation is a foreign table requesting transition capture. Also, this commit fixes make_modifytable so that in case of an inherited UPDATE/DELETE triggering transition capture, FDWs choose normal operations to modify child foreign tables, not DirectModify; which is needed because they would otherwise skip the calls to ExecAR*Triggers at execution, causing unexpected behavior. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com Backpatch-through: 13