user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2025-12-14	Update typedefs.list to match what the buildfarm currently reports.	Tom Lane
	The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
2025-12-11	Use palloc_object() and palloc_array(), the last change	Michael Paquier
	This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in 1b105f9472bd, 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-10	Use palloc_object() and palloc_array() in backend code	Michael Paquier
	The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in 0c3c5c3b06a3 and 31d3847a37be. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-02	Fix ON CONFLICT with REINDEX CONCURRENTLY and partitions	Álvaro Herrera
	When planning queries with ON CONFLICT on partitioned tables, the indexes to consider as arbiters for each partition are determined based on those found in the parent table. However, it's possible for an index on a partition to be reindexed, and in that case, the auxiliary indexes created on the partition must be considered as arbiters as well; failing to do that may result in spurious "duplicate key" errors given sufficient bad luck. We fix that in this commit by matching every index that doesn't have a parent to each initially-determined arbiter index. Every unparented matching index is considered an additional arbiter index. Closely related to the fixes in bc32a12e0db2 and 2bc7e886fc1b, and for identical reasons, not backpatched (for now) even though it's a longstanding issue. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
2025-12-02	Remove useless casting to same type	Peter Eisentraut
	This removes some casts where the input already has the same type as the type specified by the cast. Their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. It also improves readability. Same kind of idea as 7f798aca1d5 and ef8fe693606. (This does not change all such instances, but only those hand-picked by the author.) Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal
2025-11-27	Add parallelism support for TID Range Scans	David Rowley
	In v14, bb437f995 added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
2025-11-25	Improve test case stability	Álvaro Herrera
	Given unlucky timing, some of the new tests added by commit bc32a12e0db2 can fail spuriously. We haven't seen such failures yet in buildfarm, but allegedly we can prevent them with this tweak. While at it, remove an unused injection point I (Álvaro) added. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Discussion: https://postgr.es/m/CADzfLwUc=jtSUEaQCtyt8zTeOJ-gHZ8=w_KJsVjDOYSLqaY9Lg@mail.gmail.com Discussion: https://postgr.es/m/CADzfLwV5oQq-Vg_VmG_o4SdL6yHjDoNO4T4pMtgJLzYGmYf74g@mail.gmail.com
2025-11-24	Fix infer_arbiter_index during concurrent index operations	Álvaro Herrera
	Previously, we would only consider indexes marked indisvalid as usable for INSERT ON CONFLICT. But that's problematic during CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY, because concurrent transactions would end up with inconsistents lists of inferred indexes, leading to deadlocks and spurious errors about unique key violations (because two transactions are operating on different indexes for the speculative insertion tokens). Change this function to return indexes even if invalid. This fixes the spurious errors and deadlocks. Because such indexes might not be complete, we still need uniqueness to be verified in a different way. We do that by requiring that at least one index marked valid is part of the set of indexes returned. It is that index that is going to help ensure that the inserted tuple is indeed unique. This does not fix similar problems occurring with partitioned tables or with named constraints. These problems will be fixed in follow-up commits. We have no user report of this problem, even though it exists in all branches. Because of that and given that the fix is somewhat tricky, I decided not to backpatch for now. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CANtu0ogv+6wqRzPK241jik4U95s1pW3MCZ3rX5ZqbFdUysz7Qw@mail.gmail.com
2025-11-19	Fix typo in nodeHash.c	Richard Guo
	Replace "overlow" with "overflow". Author: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNnzFjAjYLTkP78HE2PQ17MjBqFdQQg+0X6Wo7YMUb68xA@mail.gmail.com
2025-11-18	Fix typo	Álvaro Herrera

2025-11-16	Fix Assert failure in EXPLAIN ANALYZE MERGE with a concurrent update.	Dean Rasheed
	When instrumenting a MERGE command containing both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a concurrent update of the target relation could lead to an Assert failure in show_modifytable_info(). In a non-assert build, this would lead to an incorrect value for "skipped" tuples in the EXPLAIN output, rather than a crash. This could happen if the concurrent update caused a matched row to no longer match, in which case ExecMerge() treats the single originally matched row as a pair of not matched rows, and potentially executes 2 not-matched actions for the single source row. This could then lead to a state where the number of rows processed by the ModifyTable node exceeds the number of rows produced by its source node, causing "skipped_path" in show_modifytable_info() to be negative, triggering the Assert. Fix this in ExecMergeMatched() by incrementing the instrumentation tuple count on the source node whenever a concurrent update of this kind is detected, if both kinds of merge actions exist, so that the number of source rows matches the number of actions potentially executed, and the "skipped" tuple count is correct. Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE actions was introduced. Bug: #19111 Reported-by: Dilip Kumar <dilipbalaut@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org Backpatch-through: 17
2025-11-10	Fix typos in nodeWindowAgg comments	Daniel Gustafsson
	One of them submitted by the author, with another one other spotted during review so this fixes both. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAHewXN=eNx2oJ_hzxJrkSvy-1A5Qf45SM8pxERWXE+6RoZyFrw@mail.gmail.com
2025-11-06	Update obsolete comment in ExecScanReScan().	Etsuro Fujita
	Commit 27cc7cd2b removed the epqScanDone flag from the EState struct, and instead added an equivalent flag named relsubs_done to the EPQState struct; but it failed to update this comment. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/CAPmGK152zJ3fU5avDT5udfL0namrDeVfMTL3dxdOXw28SOrycg%40mail.gmail.com Backpatch-through: 13
2025-11-02	Change "long" numGroups fields to be Cardinality (i.e., double).	Tom Lane
	We've been nibbling away at removing uses of "long" for a long time, since its width is platform-dependent. Here's one more: change the remaining "long" fields in Plan nodes to Cardinality, since the three surviving examples all represent group-count estimates. The upstream planner code was converted to Cardinality some time ago; for example the corresponding fields in Path nodes are type Cardinality, as are the arguments of the make_foo_path functions. Downstream in the executor, it turns out that these all feed to the table-size argument of BuildTupleHashTable. Change that to "double" as well, and fix it so that it safely clamps out-of-range values to the uint32 limit of simplehash.h, as was not being done before. Essentially, this is removing all the artificial datatype-dependent limitations on these values from upstream processing, and applying just one clamp at the moment where we're forced to do so by the datatype choices of simplehash.h. Also, remove BuildTupleHashTable's misguided attempt to enforce work_mem/hash_mem_limit. It doesn't have enough information (particularly not the expected tuple width) to do that accurately, and it has no real business second-guessing the caller's choice. For all these plan types, it's really the planner's responsibility to not choose a hashed implementation if the hashtable is expected to exceed hash_mem_limit. The previous patch improved the accuracy of those estimates, and even if BuildTupleHashTable had more information it should arrive at the same conclusions. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-11-02	Improve planner's estimates of tuple hash table sizes.	Tom Lane
	For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-10-31	Mark function arguments of type "Datum " as "const Datum " where possible	Peter Eisentraut
	Several functions in the codebase accept "Datum " parameters but do not modify the pointed-to data. These have been updated to take "const Datum " instead, improving type safety and making the interfaces clearer about their intent. This change helps the compiler catch accidental modifications and better documents immutability of arguments. Most of "Datum " parameters have a pairing "bool isnull" parameter, they are constified as well. No functional behavior is changed by this patch. Author: Chao Li <lic@highgo.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
2025-10-30	Use BumpContext contexts in TupleHashTables, and do some code cleanup.	Tom Lane
	For all extant uses of TupleHashTables, execGrouping.c itself does nothing with the "tablecxt" except to allocate new hash entries in it, and the callers do nothing with it except to reset the whole context. So this is an ideal use-case for a BumpContext, and the hash tables are frequently big enough for the savings to be significant. (Commit cc721c459 already taught nodeAgg.c this idea, but neglected the other callers of BuildTupleHashTable.) While at it, let's clean up some ill-advised leftovers from rebasing TupleHashTables on simplehash.h: * Many comments and variable names were based on the idea that the tablecxt holds the whole TupleHashTable, whereas now it only holds the hashed tuples (plus any caller-defined "additional storage"). Rename to names like tuplescxt and tuplesContext, and adjust the comments. Also adjust the memory context names to be like "<Foo> hashed tuples". * Make ResetTupleHashTable() reset the tuplescxt rather than relying on the caller to do so; that was fairly bizarre and seems like a recipe for leaks. This is less efficient in the case where nodeAgg.c uses the same tuplescxt for several different hashtables, but only microscopically so because mcxt.c will short-circuit the extra resets via its isReset flag. I judge the extra safety and intellectual cleanliness well worth those few cycles. * Remove the long-obsolete "allow_jit" check added by ac88807f9; instead, just Assert that metacxt and tuplescxt are different. We need that anyway for this definition of ResetTupleHashTable() to be safe. There is a side issue of the extent to which this change invalidates the planner's estimates of hashtable memory consumption. However, those estimates are already pretty bad, so improving them seems like it can be a separate project. This change is useful to do first to establish consistent executor behavior that the planner can expect. A loose end not addressed here is that the "entrysize" calculation in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) + additionalsize" corresponds neither to the size of the simplehash entries nor to the total space needed per tuple. It's questionable why BuildTupleHashTable is second-guessing its caller's nbuckets choice at all, since the original source of the number should have had more information. But that all seems wrapped up with the planner's estimation logic, so let's leave it for the planned followup patch. Reported-by: Jeff Janes <jeff.janes@gmail.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us
2025-10-30	Mark ItemPointer arguments as const throughout	Peter Eisentraut
	This is a follow up 991295f. I searched over src/ and made all ItemPointer arguments as const as much as possible. Note: We cut out from the original patch the pieces that would have created incompatibilities in the index or table AM APIs. Those could be considered separately. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
2025-10-30	Fix some confusing uses of const	Peter Eisentraut
	There are a few places where we have typedef struct FooData { ... } FooData; typedef FooData Foo; and then function declarations with bar(const Foo x) which isn't incorrect but probably meant bar(const FooData x) meaning that the thing x points to is immutable, not x itself. This patch makes those changes where appropriate. In one case (execGrouping.c), the thing being pointed to was not immutable, so in that case remove the const altogether, to avoid further confusion. Co-authored-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
2025-10-28	Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal()	Michael Paquier
	This new counter, called "wal_fpi_bytes", tracks the total amount in bytes of full page images (FPIs) generated in WAL. This data becomes available globally via pg_stat_wal, and for backend statistics via pg_stat_get_backend_wal(). Previously, this information could only be retrieved with pg_waldump or pg_walinspect, which may not be available depending on the environment, and are expensive to execute. It offers hints about how much FPIs impact the WAL generated, which could be a large percentage for some workloads, as well as the effects of wal_compression or page holes. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in PgStat_WalCounters. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
2025-10-27	Fix a couple of comments.	Nathan Bossart
	These were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent. Oversights in commits 393e0d2314 and 25a30bbd42. Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
2025-10-26	Fix incorrect logic for caching ResultRelInfos for triggers	David Rowley
	When dealing with ResultRelInfos for partitions, there are cases where there are mixed requirements for the ri_RootResultRelInfo. There are cases when the partition itself requires a NULL ri_RootResultRelInfo and in the same query, the same partition may require a ResultRelInfo with its parent set in ri_RootResultRelInfo. This could cause the column mapping between the partitioned table and the partition not to be done which could result in crashes if the column attnums didn't match exactly. The fix is simple. We now check that the ri_RootResultRelInfo matches what the caller passed to ExecGetTriggerResultRel() and only return a cached ResultRelInfo when the ri_RootResultRelInfo matches what the caller wants, otherwise we'll make a new one. Author: David Rowley <dgrowleyml@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reported-by: Dmitry Fomin <fomin.list@gmail.com> Discussion: https://postgr.es/m/7DCE78D7-0520-4207-822B-92F60AEA14B4@gmail.com Backpatch-through: 15
2025-10-23	Introduce "REFRESH SEQUENCES" for subscriptions.	Amit Kapila
	This patch adds support for a new SQL command: ALTER SUBSCRIPTION ... REFRESH SEQUENCES This command updates the sequence entries present in the pg_subscription_rel catalog table with the INIT state to trigger resynchronization. In addition to the new command, the following subscription commands have been enhanced to automatically refresh sequence mappings: ALTER SUBSCRIPTION ... REFRESH PUBLICATION ALTER SUBSCRIPTION ... ADD PUBLICATION ALTER SUBSCRIPTION ... DROP PUBLICATION ALTER SUBSCRIPTION ... SET PUBLICATION These commands will perform the following actions: Add newly published sequences that are not yet part of the subscription. Remove sequences that are no longer included in the publication. This ensures that sequence replication remains aligned with the current state of the publication on the publisher side. Note that the actual synchronization of sequence data/values will be handled in a subsequent patch that introduces a dedicated sequence sync worker. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-23	Fix coding style with "else".	Tatsuo Ishii
	The "else" code block having single statement with comments on a separate line should have been surrounded by braces. Reported-by: Chao Li <lic@highgo.com> Suggested-by: David Rowley <dgrowleyml@gmail.com> Author: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/20251020.125847.997839131426057290.ishii%40postgresql.org
2025-10-22	Fix multi WinGetFuncArgInFrame/Partition calls with IGNORE NULLS.	Tatsuo Ishii
	Previously it was mistakenly assumed that there's only one window function argument which needs to be processed by WinGetFuncArgInFrame or WinGetFuncArgInPartition when IGNORE NULLS option is specified. To eliminate the limitation, WindowObject->notnull_info is modified from "uint8 " to "uint8 " so that WindowObject->notnull_info could store pointers to "uint8 " which holds NOT NULL info corresponding to each window function argument. Moreover, WindowObject->num_notnull_info is changed from "int" to "int64 *" so that WindowObject->num_notnull_info could store the number of NOT NULL info corresponding to each function argument. Memories for these data structures will be allocated when WinGetFuncArgInFrame or WinGetFuncArgInPartition is called. Thus no memory except the pointers is allocated for function arguments which do not call these functions Also fix the set mark position logic in WinGetFuncArgInPartition to not raise a "cannot fetch row before WindowObject's mark position" error in IGNORE NULLS case. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
2025-10-19	Fix Coverity issue reported in commit 2273fa32bce.	Tatsuo Ishii
	Coverity complains that the return value from gettuple_eval_partition (stored in variable "datum") in a do..while loop in WinGetFuncArgInPartition is overwritten when exiting the while loop. This commit tries to fix the issue by changing the gettuple_eval_partition call to: (void) gettuple_eval_partition() explicitly stating that we discard the return value. We are just interested in whether we are inside or outside of partition, NULL or NOT NULL here. Also enhance some comments for easier code reading. Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aPCOabSE4VfJLaky%40paquier.xyz
2025-10-18	Fix reset of incorrect hash iterator in GROUPING SETS queries	David Rowley
	This fixes an unlikely issue when fetching GROUPING SET results from their internally stored hash tables. It was possible in rare cases that the hash iterator would be set up incorrectly which could result in a crash. This was introduced in 4d143509c, so backpatch to v18. Many thanks to Yuri Zamyatin for reporting and helping to debug this issue. Bug: #19078 Reported-by: Yuri Zamyatin <yuri@yrz.am> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org Backpatch-through: 18
2025-10-17	Fix hashjoin memory balancing logic	Tomas Vondra
	Commit a1b4f289beec improved the hashjoin sizing to also consider the memory used by BufFiles for batches. The code however had multiple issues, making it ineffective or not working as expected in some cases. * The amount of memory needed by buffers was calculated using uint32, so it would overflow for nbatch >= 262144. If this happened the loop would exit prematurely and the memory usage would not be reduced. The nbatch overflow is fixed by reworking the condition to not use a multiplication at all, so there's no risk of overflow. An explicit cast was added to a similar calculation in ExecHashIncreaseBatchSize. * The loop adjusting the nbatch value used hash_table_bytes to calculate the old/new size, but then updated only space_allowed. The consequence is the total memory usage was not reduced, but all the memory saved by reducing the number of batches was used for the internal hash table. This was fixed by using only space_allowed. This is also more correct, because hash_table_bytes does not account for skew buckets. * The code was also doubling multiple parameters (e.g. the number of buckets for hash table), but was missing overflow protections. The loop now checks for overflow, and terminates if needed. It'd be possible to cap the value and continue the loop, but it's not worth the complexity. And the overflow implies the in-memory hash table is already very large anyway. While at it, rework the comment explaining how the memory balancing works, to make it more concise and easier to understand. The initial nbatch overflow issue was reported by Vaibhav Jain. The other issues were noticed by me and Melanie Plageman. Fix by me, with a lot of review and feedback by Melanie. Backpatch to 18, where the hashjoin memory balancing was introduced. Reported-by: Vaibhav Jain <jainva@google.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Backpatch-through: 18 Discussion: https://postgr.es/m/CABa-Az174YvfFq7rLS+VNKaQyg7inA2exvPWmPWqnEn6Ditr_Q@mail.gmail.com
2025-10-16	Fix EPQ crash from missing partition directory in EState	Amit Langote
	EvalPlanQualStart() failed to propagate es_partition_directory into the child EState used for EPQ rechecks. When execution time partition pruning ran during the EPQ scan, executor code dereferenced a NULL partition directory and crashed. Previously, propagating es_partition_directory into the EPQ EState was unnecessary because CreatePartitionPruneState(), which sets it on demand, also initialized the exec-pruning context. After commit d47cbf474, CreatePartitionPruneState() now initializes only the init- time pruning context, leaving exec-pruning context initialization to ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and not CreatePartitionPruneState(), it can encounter a NULL es_partition_directory. Other executor fields initialized during CreatePartitionPruneState() are already copied into the child EState thanks to commit 8741e48e5d, but es_partition_directory was missed. Fix by borrowing the parent estate's es_partition_directory in EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd() so the parent remains responsible for freeing the directory. Add an isolation test permutation that triggers EPQ with execution- time partition pruning, the case that reproduces this crash. Bug: #19078 Reported-by: Yuri Zamyatin <yuri@yrz.am> Diagnosed-by: David Rowley <dgrowleyml@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Co-authored-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org Backpatch-through: 18
2025-10-14	Use ereport rather than elog in WinCheckAndInitializeNullTreatment.	Tatsuo Ishii
	Previously WinCheckAndInitializeNullTreatment() used elog() to emit an error message. ereport() should be used instead because it's a user-facing error. Also use existing get_func_name() to get a function's name, rather than own implementation. Moreover add an assertion to validate winobj parameter, just like other window function API. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
2025-10-09	Avoid uninitialized-variable warnings from older compilers.	Tom Lane
	Some of the buildfarm is still unhappy with WinGetFuncArgInPartition even after 2273fa32b. While it seems to be just very old compilers, we can suppress the warnings and arguably make the code more readable by not initializing these variables till closer to where they are used. While at it, make a couple of cosmetic comment improvements.
2025-10-08	Fix Coverity issues reported in commit 25a30bbd423.	Tatsuo Ishii
	Fix several issues pointed out by Coverity (reported by Tome Lane). - In row_is_in_frame(), return value of window_gettupleslot() was not checked. - WinGetFuncArgInPartition() tried to derefference "isout" pointer even if it could be NULL in some places. Besides the issues, I also fixed a compiler warning reported by Álvaro Herrera. Moreover, in WinGetFuncArgInPartition refactor the do...while loop so that the codes inside the loop simpler. Also simplify the case when abs_pos < 0. Author: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Paul Ramsey <pramsey@cleverelephant.ca> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/1686755.1759679957%40sss.pgh.pa.us Discussion: https://postgr.es/m/202510051612.gw67jlc2iqpw%40alvherre.pgsql
2025-10-05	Don't include access/htup_details.h in executor/tuptable.h	Álvaro Herrera
	This is not at all needed; I suspect it was a simple mistake in commit 5408e233f066. It causes htup_details.h to bleed into a huge number of places via execnodes.h. Remove it and fix fallout. Discussion: https://postgr.es/m/202510021240.ptc2zl5cvwen@alvherre.pgsql
2025-10-03	Add IGNORE NULLS/RESPECT NULLS option to Window functions.	Tatsuo Ishii
	Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead, lag, first_value, last_value and nth_value window functions. If unspecified, the default is RESPECT NULLS which includes NULL values in any result calculation. IGNORE NULLS ignores NULL values. Built-in window functions are modified to call new API WinCheckAndInitializeNullTreatment() to indicate whether they accept IGNORE NULLS/RESPECT NULLS option or not (the API can be called by user defined window functions as well). If WinGetFuncArgInPartition's allowNullTreatment argument is true and IGNORE NULLS option is given, WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return evaluated function's argument expression on specified non NULL row (if it exists) in the partition or the frame. When IGNORE NULLS option is given, window functions need to visit and evaluate same rows over and over again to look for non null rows. To mitigate the issue, 2-bit not null information array is created while executing window functions to remember whether the row has been already evaluated to NULL or NOT NULL. If already evaluated, we could skip the evaluation work, thus we could get better performance. Author: Oliver Ford <ojford@gmail.com> Co-authored-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com> Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
2025-09-19	Fix EPQ crash from missing partition pruning state in EState	Amit Langote
	Commit bb3ec16e14 moved partition pruning metadata into PlannedStmt. At executor startup this metadata is used to initialize the EState fields es_part_prune_infos, es_part_prune_states, and es_part_prune_results. EvalPlanQualStart() failed to copy those fields into the child EState, causing NULL dereference when Append ran partition pruning during a recheck. This can occur with DELETE or UPDATE on partitioned tables that use runtime pruning, e.g. with generic plans. Fix by copying all partition pruning state into the EPQ estate. Add an isolation test that reproduces the crash with concurrent UPDATE and DELETE on a partitioned table, where the DELETE session hits the crash during its EPQ recheck after the UPDATE commits. Bug: #19056 Reported-by: Fei Changhong <feichanghong@qq.com> Diagnozed-by: Fei Changhong <feichanghong@qq.com> Author: David Rowley <dgrowleyml@gmail.com> Co-authored-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/19056-a677cef9b54d76a0%40postgresql.org
2025-09-17	Add missing EPQ recheck for TID Range Scan	David Rowley
	The EvalPlanQual recheck for TID Range Scan wasn't rechecking the TID qual still passed after following update chains. This could result in tuples being updated or deleted by plans using TID Range Scans where the ctid of the new (updated) tuple no longer matches the clause of the scan. This isn't desired behavior, and isn't consistent with what would happen if the chosen plan had used an Index or Seq Scan, and that could lead to hard to predict behavior for scans that contain TID quals and other quals as the planner has freedom to choose TID Range or some other non-TID scan method for such queries, and the chosen plan could change at any moment. Here we fix this by properly implementing the recheck function for TID Range Scans. Backpatch to 14, where TID Range Scans were added Reported-by: Sophie Alpert <pg@sophiebits.com> Author: Sophie Alpert <pg@sophiebits.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com Backpatch-through: 14
2025-09-17	Add missing EPQ recheck for TID Scan	David Rowley
	The EvalPlanQual recheck for TID Scan wasn't rechecking the TID qual still passed after following update chains. This could result in tuples being updated or deleted by plans using TID Scans where the ctid of the new (updated) tuple no longer matches the clause of the scan. This isn't desired behavior, and isn't consistent with what would happen if the chosen plan had used an Index or Seq Scan, and that could lead to hard to predict behavior for scans that contain TID quals and other quals as the planner has freedom to choose TID or some other scan method for such queries, and the chosen plan could change at any moment. Here we fix this by properly implementing the recheck function for TID Scans. Backpatch to 13, oldest supported version Reported-by: Sophie Alpert <pg@sophiebits.com> Author: Sophie Alpert <pg@sophiebits.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com Backpatch-through: 13
2025-09-10	Eliminate duplicative hashtempcxt in nodeSubplan.c.	Tom Lane
	Instead of building a separate memory context that's used just for running hash functions, make the hash functions run in the per-tuple context of the node's innerecontext. This saves a little space at runtime, and it avoids needing to reset two contexts instead of one inside buildSubPlanHash's main loop. This largely reverts commit 133924e13. That's safe to do now because bf6c614a2 decoupled the evaluation context used by TupleHashTableMatch from that used for hash function evaluation, so that there's no longer a risk of resetting the innerecontext too soon. Per discussion of bug #19040, although this is not directly a fix for that. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Haiyang Li <mohen.lhy@alibaba-inc.com> Reviewed-by: Fei Changhong <feichanghong@qq.com> Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
2025-09-10	Fix memory leakage in nodeSubplan.c.	Tom Lane
	If the hash functions used for hashing tuples leaked any memory, we failed to clean that up, resulting in query-lifespan memory leakage in queries using hashed subplans. One way that could happen is if the values being hashed require de-toasting, since most of our hash functions don't trouble to clean up de-toasted inputs. Prior to commit bf6c614a2, this leakage was largely masked because TupleHashTableMatch would reset hashtable->tempcxt (via execTuplesMatch). But it doesn't do that anymore, and that's not really the right place for this anyway: doing it there could reset the tempcxt many times per hash lookup, or not at all. Instead put reset calls into ExecHashSubPlan and buildSubPlanHash. Along the way to that, rearrange ExecHashSubPlan so that there's just one place to call MemoryContextReset instead of several. This amounts to accepting the de-facto API spec that the caller of the TupleHashTable routines is responsible for resetting the tempcxt adequately often. Although the other callers seem to get this right, it was not documented anywhere, so add a comment about it. Bug: #19040 Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com> Author: Haiyang Li <mohen.lhy@alibaba-inc.com> Reviewed-by: Fei Changhong <feichanghong@qq.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org Backpatch-through: 13
2025-09-10	Replace callers of dynahash.h's my_log() by equivalent in pg_bitutils.h	Michael Paquier
	All the calls replaced by this commit use 4-byte integers for their variables used in input of my_log2(). Hence, the limit against too-large inputs does not really apply. Thresholds are also applied, as of: - In nodeAgg.c, the number of partitions is limited by HASHAGG_MAX_PARTITIONS. - In nodeHash.c, ExecChooseHashTableSize() caps its maximum number of buckets based on HashJoinTuple and palloc() allocation limit. - In worker.c, the number of subxacts tracked by ApplySubXactData uses uint32, making pg_ceil_log2_64() safe to use directly. Several approaches have been discussed, like an integration with thresholds in pg_bitutils.h, but it was found confusing. This uses Dean's idea, which gives a simpler result than what I came up with to be able to remove dynahash.h. dynahash.h will be removed in a follow-up commit, removing some duplication with the ceil log2 routines. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAEZATCUJPQD_7sC-wErak2CQGNa6bj2hY-mr8wsBki=kX7f2_A@mail.gmail.com
2025-09-08	Don't generate fake "SELECT" or "SELECT %d" subquery aliases.	Robert Haas
	rte->alias should point only to a user-written alias, but in these cases that principle was violated. Fixing this causes some regression test output changes: wherever rte->alias previously had a value and is now NULL, rte->eref is now set to a generated name rather than to rte->alias; and the scheme used to generate eref names differs from what we were doing for aliases. The upshot is that instead of "SELECT" or "SELECT %d", EXPLAIN will now emit "unnamed_subquery" or "unnamed_subquery_%d". But that's a reasonable descriptor, and we were already producing that in yet other cases, so this seems not too objectionable. Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Robert Haas <rhaas@postgresql.org> Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com
2025-09-05	Fix concurrent update issue with MERGE.	Dean Rasheed
	When executing a MERGE UPDATE action, if there is more than one concurrent update of the target row, the lock-and-retry code would sometimes incorrectly identify the latest version of the target tuple, leading to incorrect results. This was caused by using the ctid field from the TM_FailureData returned by table_tuple_lock() in a case where the result was TM_Ok, which is unsafe because the TM_FailureData struct is not guaranteed to be fully populated in that case. Instead, it should use the tupleid passed to (and updated by) table_tuple_lock(). To reduce the chances of similar errors in the future, improve the commentary for table_tuple_lock() and TM_FailureData to make it clearer that table_tuple_lock() updates the tid passed to it, and most fields of TM_FailureData should not be relied on in non-failure cases. An exception to this is the "traversed" field, which is set in both success and failure cases. Reported-by: Dmitry <dsy.075@yandex.ru> Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru Backpatch-through: 15
2025-09-04	Fix replica identity check for MERGE.	Dean Rasheed
	When executing a MERGE, check that the target relation supports all actions mentioned in the MERGE command. Specifically, check that it has a REPLICA IDENTITY if it publishes updates or deletes and the MERGE command contains update or delete actions. Failing to do this can silently break replication. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com Backpatch-through: 15
2025-09-04	Fix replica identity check for INSERT ON CONFLICT DO UPDATE.	Dean Rasheed
	If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must check that the target relation supports UPDATE as well as INSERT. In particular, it must check that the target relation has a REPLICA IDENTITY if it publishes updates. Formerly, it was not doing this check, which could lead to silently breaking replication. Fix by adding such a check to CheckValidResultRel(), which requires adding a new onConflictAction argument. In back-branches, preserve ABI compatibility by introducing a wrapper function with the original signature. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com Backpatch-through: 13
2025-08-25	Message wording improvements	Peter Eisentraut
	Use "row" instead of "tuple" for user-facing information for logical replication conflicts.
2025-08-20	Fix re-execution of a failed SQLFunctionCache entry.	Tom Lane
	If we error out during execution of a SQL-language function, we will often leave behind non-null pointers in its SQLFunctionCache's cplan and eslist fields. This is problematic if the SQLFunctionCache is re-used, because those pointers will point at resources that were released during error cleanup. This problem escaped detection so far because ordinarily we won't re-use an FmgrInfo+SQLFunctionCache struct after a query error. However, in the rather improbable case that someone implements an opclass support function in SQL language, there will be long-lived FmgrInfos for it in the relcache, and then the problem is reachable after the function throws an error. To fix, add a flag to SQLFunctionCache that tracks whether execution escapes out of fmgr_sql, and clear out the relevant fields during init_sql_fcache if so. (This is going to need more thought if we ever try to share FMgrInfos across threads; but it's very far from being the only problem such a project will encounter, since many functions regard fn_extra as being query-local state.) This broke at commit 0313c5dc6; before that we did not try to re-use SQLFunctionCache state across calls. Hence, back-patch to v18. Bug: #19026 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19026-90aed5e71d0c8af3@postgresql.org Backpatch-through: 18
2025-08-16	Fix typos in comments.	Masahiko Sawada
	Oversight in commit fd5a1a0c3e56. Author: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/CAHewXNmTT3M_w4NngG=6G3mdT3iJ6DdncTqV9YnGXBPHW8XYtA@mail.gmail.com
2025-08-11	Reduce ExecSeqScan* code size using pg_assume()	Andres Freund
	fb9f955025f optimized code generation by using specialized variants of ExecSeqScan* for [not] having a qual, projection etc. This allowed the compiler to optimize the code out the code for qual / projection. However, as observed by David Rowley at the time, the compiler couldn't prove the opposite, i.e. that the qual etc are present. By using pg_assume(), introduced in d65eb5b1b84, we can tell the compiler that the relevant variables are non-null. This reduces the code size to a surprising degree and seems to lead to a small but reproducible performance gain. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFk-MbwhfX_kucxzL8zLmjEt9MMcHi2YF=DyhPrSjsBEA@mail.gmail.com
2025-08-11	Fix security checks in selectivity estimation functions.	Dean Rasheed
	Commit e2d4ef8de86 (the fix for CVE-2017-7484) added security checks to the selectivity estimation functions to prevent them from running user-supplied operators on data obtained from pg_statistic if the user lacks privileges to select from the underlying table. In cases involving inheritance/partitioning, those checks were originally performed against the child RTE (which for plain inheritance might actually refer to the parent table). Commit 553d2ec2710 then extended that to also check the parent RTE, allowing access if the user had permissions on either the parent or the child. It turns out, however, that doing any checks using the child RTE is incorrect, since securityQuals is set to NULL when creating an RTE for an inheritance child (whether it refers to the parent table or the child table), and therefore such checks do not correctly account for any RLS policies or security barrier views. Therefore, do the security checks using only the parent RTE. This is consistent with how RLS policies are applied, and the executor's ACL checks, both of which use only the parent table's permissions/policies. Similar checks are performed in the extended stats code, so update that in the same way, centralizing all the checks in a new function. In addition, note that these checks by themselves are insufficient to ensure that the user has access to the table's data because, in a query that goes via a view, they only check that the view owner has permissions on the underlying table, not that the current user has permissions on the view itself. In the selectivity estimation functions, there is no easy way to navigate from underlying tables to views, so add permissions checks for all views mentioned in the query to the planner startup code. If the user lacks permissions on a view, a permissions error will now be reported at planner-startup, and the selectivity estimation functions will not be run. Checking view permissions at planner-startup in this way is a little ugly, since the same checks will be repeated at executor-startup. Longer-term, it might be better to move all the permissions checks from the executor to the planner so that permissions errors can be reported sooner, instead of creating a plan that won't ever be run. However, such a change seems too far-reaching to be back-patched. Back-patch to all supported versions. In v13, there is the added complication that UPDATEs and DELETEs on inherited target tables are planned using inheritance_planner(), which plans each inheritance child table separately, so that the selectivity estimation functions do not know that they are dealing with a child table accessed via its parent. Handle that by checking access permissions on the top parent table at planner-startup, in the same way as we do for views. Any securityQuals on the top parent table are moved down to the child tables by inheritance_planner(), so they continue to be checked by the selectivity estimation functions. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13 Security: CVE-2025-8713
2025-08-08	Mop-up for Datum conversion cleanups.	Tom Lane
	Fix a couple more places where an explicit Datum conversion is needed (not clear how we missed these in ff89e182d and previous commits). Replace the minority usage "(Datum) NULL" with "(Datum) 0". The former depends on the assumption that Datum is the same width as Pointer, the latter doesn't. Anyway consistency is a good thing. This is, I believe, the last of the notational mop-up needed before we can consider changing Datum to uint64 everywhere. It's also important cleanup for more aggressive ideas such as making Datum a struct. Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us Discussion: https://postgr.es/m/8246d7ff-f4b7-4363-913e-827dadfeb145@eisentraut.org