summaryrefslogtreecommitdiff
path: root/src/backend/executor/execPartition.c
AgeCommit message (Collapse)Author
2024-06-27Fix thinkos in commentsAlvaro Herrera
The first one was noticed by Tender Wang and introduced with 8aba9322511f; the other one was newly introduced with dbca3469ebf8.
2024-06-26Fix partition pruning setup during DETACH CONCURRENTLYAlvaro Herrera
When detaching partition in concurrent mode, it's possible for partition descriptors to not match the set that was recently seen when the plan was made, causing an assertion failure or (in production builds) failure to construct a working plan. The case that was reported involves prepared statements, but I think it may be possible to hit this bug without that too. The problem is that CreatePartitionPruneState is constructing a PartitionPruneState under the assumption that new partitions can be added, but never removed, but it turns out that this isn't true: a prepared statement gets replanned when the DETACH CONCURRENTLY session sends out its invalidation message, but if the invalidation message arrives after ExecInitAppend started, we would build a partition descriptor without the partition, and then CreatePartitionPruneState would refuse to work with it. CreatePartitionPruneState already contains code to deal with the new descriptor having more partitions than before (and behaving for the extra partitions as if they had been pruned), but doesn't have code to deal with less partitions than before, and it is naïve about the case where the number of partitions is the same. We could simply add that a new stanza for less partitions than before, and in simple testing it works to do that; but it's possible to press the test scripts even further and hit the case where one partition is added and a partition is removed quickly enough that we see the same number of partitions, but they don't actually match, causing hangs during execution. To cope with both these problems, we now memcmp() the arrays of partition OIDs, and do a more elaborate mapping (relying on the fact that both OID arrays are in partition-bounds order) if they're not identical. This fix was already pushed in backbranches earlier. Reported-by: yajun Hu <1026592243@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
2024-06-24Revert "Fix partition pruning setup during DETACH CONCURRENTLY"Alvaro Herrera
This reverts commit 27162a64b386; this branch is in code freeze due to a nearing release. We can commit again after the release is out. Discussion: https://postgr.es/m/1158256.1719239648@sss.pgh.pa.us
2024-06-24Fix partition pruning setup during DETACH CONCURRENTLYAlvaro Herrera
When detaching partition in concurrent mode, it's possible for partition descriptors to not match the set that was recently seen when the plan was made, causing an assertion failure or (in production builds) failure to construct a working plan. The case that was reported involves prepared statements, but I think it may be possible to hit this bug without that too. The problem is that CreatePartitionPruneState is constructing a PartitionPruneState under the assumption that new partitions can be added, but never removed, but it turns out that this isn't true: a prepared statement gets replanned when the DETACH CONCURRENTLY session sends out its invalidation message, but if the invalidation message arrives after ExecInitAppend started, we would build a partition descriptor without the partition, and then CreatePartitionPruneState would refuse to work with it. CreatePartitionPruneState already contains code to deal with the new descriptor having more partitions than before (and behaving for the extra partitions as if they had been pruned), but doesn't have code to deal with less partitions than before, and it is naïve about the case where the number of partitions is the same. We could simply add that a new stanza for less partitions than before, and in simple testing it works to do that; but it's possible to press the test scripts even further and hit the case where one partition is added and a partition is removed quickly enough that we see the same number of partitions, but they don't actually match, causing hangs during execution. To cope with both these problems, we now memcmp() the arrays of partition OIDs, and do a more elaborate mapping (relying on the fact that both OID arrays are in partition-bounds order) if they're not identical. Backpatch to 14, where DETACH CONCURRENTLY appeared. Reported-by: yajun Hu <1026592243@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
2024-03-30Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.Dean Rasheed
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE actions, which operate on rows that exist in the target relation, but not in the data source. These actions can execute UPDATE, DELETE, or DO NOTHING sub-commands. This is in contrast to already-supported WHEN NOT MATCHED actions, which operate on rows that exist in the data source, but not in the target relation. To make this distinction clearer, such actions may now be written as WHEN NOT MATCHED BY TARGET. Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is equivalent to writing WHEN NOT MATCHED BY TARGET. Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing. Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
2024-03-17Add RETURNING support to MERGE.Dean Rasheed
This allows a RETURNING clause to be appended to a MERGE query, to return values based on each row inserted, updated, or deleted. As with plain INSERT, UPDATE, and DELETE commands, the returned values are based on the new contents of the target table for INSERT and UPDATE actions, and on its old contents for DELETE actions. Values from the source relation may also be returned. As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be used as the source relation for other operations such as WITH queries and COPY commands. Additionally, a special function merge_action() is provided, which returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action executed for each row. The merge_action() function can be used anywhere in the RETURNING list, including in arbitrary expressions and subqueries, but it is an error to use it anywhere outside of a MERGE query's RETURNING list. Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera, Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut, and Wolfgang Walther. Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-02-29Support MERGE into updatable views.Dean Rasheed
This allows the target relation of MERGE to be an auto-updatable or trigger-updatable view, and includes support for WITH CHECK OPTION, security barrier views, and security invoker views. A trigger-updatable view must have INSTEAD OF triggers for every type of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command. An auto-updatable view must not have any INSTEAD OF triggers. Mixing auto-update and trigger-update actions (i.e., having a partial set of INSTEAD OF triggers) is not supported. Rule-updatable views are also not supported, since there is no rewriter support for non-SELECT rules with MERGE operations. Dean Rasheed, reviewed by Jian He and Alvaro Herrera. Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-10-13Fix runtime partition pruning for HASH partitioned tablesDavid Rowley
This could only affect HASH partitioned tables with at least 2 partition key columns. If partition pruning was delayed until execution and the query contained an IS NULL qual on one of the partitioned keys, and some subsequent partitioned key was being compared to a non-Const, then this could result in a crash due to the incorrect keyno being used to calculate the stateidx for the expression evaluation code. Here we fix this by properly skipping partitioned keys which have a nullkey set. Effectively, this must be the same as what's going on inside perform_pruning_base_step(). Sergei Glukhov also provided a patch, but that's not what's being used here. Reported-by: Sergei Glukhov Reviewed-by: tender wang, Sergei Glukhov Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru Backpatch-through: 11, where runtime partition pruning was added.
2023-05-04Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"Alvaro Herrera
This reverts commit ec386948948c and its fixup 589bb816499e. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql
2023-03-24Invent GENERIC_PLAN option for EXPLAIN.Tom Lane
This provides a very simple way to see the generic plan for a parameterized query. Without this, it's necessary to define a prepared statement and temporarily change plan_cache_mode, which is a bit tedious. One thing that's a bit of a hack perhaps is that we disable execution-time partition pruning when the GENERIC_PLAN option is given. That's because the pruning code may attempt to fetch the value of one of the parameters, which would fail. Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones, and myself Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at
2023-02-22Fix Assert failure for MERGE into a partitioned table with RLS.Dean Rasheed
In ExecInitPartitionInfo(), the Assert when building the WITH CHECK OPTION list for the new partition assumed that the command would be an INSERT or UPDATE, but it can also be a MERGE. This can be triggered by a MERGE into a partitioned table with RLS checks to enforce. Fix, and back-patch to v15, where MERGE was introduced. Discussion: https://postgr.es/m/CAEZATCWWFtQmW67F3XTyMU5Am10Oxa_b8oe0x%2BNu5Mo%2BCdRErg%40mail.gmail.com
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-20Allow batching of inserts during cross-partition updates.Etsuro Fujita
Commit 927f453a9 disallowed batching added by commit b663a4136 to be used for the inserts performed as part of cross-partition updates of partitioned tables, mainly because the previous code in nodeModifyTable.c couldn't handle pending inserts into foreign-table partitions that are also UPDATE target partitions. But we don't have such a limitation anymore (cf. commit ffbb7e65a), so let's allow for this by removing from execPartition.c the restriction added by commit 927f453a9 that batching is only allowed if the query command type is CMD_INSERT. In postgres_fdw, since commit 86dc90056 changed it to effectively disable cross-partition updates in the case where a foreign-table partition chosen to insert rows into is also an UPDATE target partition, allow batching in the case where a foreign-table partition chosen to do so is *not* also an UPDATE target partition. This is enabled by the "batch_size" option added by commit b663a4136, which is disabled by default. This patch also adjusts the test case added by commit 927f453a9 to confirm that the inserts performed as part of a cross-partition update of a partitioned table indeed uses batching. Amit Langote, reviewed and/or tested by Georgios Kokolatos, Zhihong Yu, Bharath Rupireddy, Hou Zhijie, Vignesh C, and me. Discussion: http://postgr.es/m/CA%2BHiwqH1Lz1yJmPs%3DaD-pzd_HLLynLHvq5iYeT9mB0bBV7oJ6w%40mail.gmail.com
2022-12-08Remove new structure member from ResultRelInfo.Etsuro Fujita
In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo to save the owning ModifyTableState for use by nodeModifyTable.c when performing batch inserts, but as pointed out by Tom Lane, that changed the array stride of es_result_relations, and that would break any previously-compiled extension code that accesses that array. Fix by removing that member from ResultRelInfo and instead adding a List member at the end of EState to save such ModifyTableStates. Per report from Tom Lane. Back-patch to v14, like the previous commit; I chose to apply the patch to HEAD as well, to make back-patching easy. Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
2022-12-02Generalize ri_RootToPartitionMap to use for non-partition childrenAlvaro Herrera
ri_RootToPartitionMap is currently only initialized for tuple routing target partitions, though a future commit will need the ability to use it even for the non-partition child tables, so make adjustments to the decouple it from the partitioning code. Also, make it lazily initialized via ExecGetRootToChildMap(), making that function its preferred access path. Existing third-party code accessing it directly should no longer do so; consequently, it's been renamed to ri_RootToChildMap, which also makes it consistent with ri_ChildToRootMap. ExecGetRootToChildMap() houses the logic of setting the map appropriately depending on whether a given child relation is partition or not. To support this, also add a separate entry point for TupleConversionMap creation that receives an AttrMap. No new code here, just split an existing function in two. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
2022-12-01Move PartitioPruneInfo out of plan nodes into PlannedStmtAlvaro Herrera
The planner will now add a given PartitioPruneInfo to PlannedStmt.partPruneInfos instead of directly to the Append/MergeAppend plan node. What gets set instead in the latter is an index field which points to the list element of PlannedStmt.partPruneInfos containing the PartitioPruneInfo belonging to the plan node. A later commit will make AcquireExecutorLocks() do the initial partition pruning to determine a minimal set of partitions to be locked when validating a plan tree and it will need to consult the PartitioPruneInfos referenced therein to do so. It would be better for the PartitioPruneInfos to be accessible directly than requiring a walk of the plan tree to find them, which is easier when it can be done by simply iterating over PlannedStmt.partPruneInfos. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2022-11-29Add 'missing_ok' argument to build_attrmap_by_nameAlvaro Herrera
When it's given as true, return a 0 in the position of the missing column rather than raising an error. This is currently unused, but it allows us to reimplement column permission checking in a subsequent commit. It seems worth breaking into a separate commit because it affects unrelated code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
2022-11-25Fix handling of pending inserts in nodeModifyTable.c.Etsuro Fujita
Commit b663a4136, which allowed FDWs to INSERT rows in bulk, added to nodeModifyTable.c code to flush pending inserts to the foreign-table result relation(s) before completing processing of the ModifyTable node, but the code failed to take into account the case where the INSERT query has modifying CTEs, leading to incorrect results. Also, that commit failed to flush pending inserts before firing BEFORE ROW triggers so that rows are visible to such triggers. In that commit we scanned through EState's es_tuple_routing_result_relations or es_opened_result_relations list to find the foreign-table result relations to which pending inserts are flushed, but that would be inefficient in some cases. So to fix, 1) add a List member to EState to record the insert-pending result relations, and 2) modify nodeModifyTable.c so that it adds the foreign-table result relation to the list in ExecInsert() if appropriate, and flushes pending inserts properly using the list where needed. While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(), which was added by that commit. Back-patch to v14 where that commit appeared. Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com
2022-09-20Remove various duplicated wordsDavid Rowley
Author: Justin Pryzby Discussion: https://postgr.es/m/20220919111000.GW31833@telsasoft.com
2022-08-26More -Wshadow=compatible-local warning fixesDavid Rowley
In a similar effort to f01592f91, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
2022-08-17Avoid using list_length() to test for empty list.Tom Lane
The standard way to check for list emptiness is to compare the List pointer to NIL; our list code goes out of its way to ensure that that is the only representation of an empty list. (An acceptable alternative is a plain boolean test for non-null pointer, but explicit mention of NIL is usually preferable.) Various places didn't get that memo and expressed the condition with list_length(), which might not be so bad except that there were such a variety of ways to check it exactly: equal to zero, less than or equal to zero, less than one, yadda yadda. In the name of code readability, let's standardize all those spellings as "list == NIL" or "list != NIL". (There's probably some microscopic efficiency gain too, though few of these look to be at all performance-critical.) A very small number of cases were left as-is because they seemed more consistent with other adjacent list_length tests that way. Peter Smith, with bikeshedding from a number of us Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com
2022-08-02Have ExecFindPartition cache the last found partitionDavid Rowley
Here we add code which detects when ExecFindPartition() continually finds the same partition and add a caching layer to improve partition lookup performance for such cases. Both RANGE and LIST partitioned tables traditionally require a binary search for the set of Datums that a partition needs to be found for. This binary search is commonly visible in profiles when bulk loading into a partitioned table. Here we aim to reduce the overhead of bulk-loading into partitioned tables for cases where many consecutive tuples belong to the same partition and make the performance of this operation closer to what it is with a traditional non-partitioned table. When we find the same partition 16 times in a row, the next search will result in us simply just checking if the current set of values belongs to the last found partition. For LIST partitioning we record the index into the PartitionBoundInfo's datum array. This allows us to check if the current Datum is the same as the Datum that was last looked up. This means if any given LIST partition supports storing multiple different Datum values, then the caching only works when we find the same value as we did the last time. For RANGE partitioning we simply check if the given Datums are in the same range as the previously found partition. We store the details of the cached partition in PartitionDesc (i.e. relcache) so that the cached values are maintained over multiple statements. No caching is done for HASH partitions. The majority of the cost in HASH partition lookups are in the hashing function(s), which would also have to be executed if we were to try to do caching for HASH partitioned tables. Since most of the cost is already incurred, we just don't bother. We also don't do any caching for LIST partitions when we continually find the values being looked up belong to the DEFAULT partition. We've no corresponding index in the PartitionBoundInfo's datum array for this case. We also don't cache when we find the given values match to a LIST partitioned table's NULL partition. This is so cheap that there's no point in doing any caching for this. We also don't cache for a RANGE partitioned table's DEFAULT partition. There have been a number of different patches submitted to improve partition lookups. Hou, Zhijie submitted a patch to detect when the value belonging to the partition key column(s) were constant and added code to cache the partition in that case. Amit Langote then implemented an idea suggested by me to remember the last found partition and start to check if the current values work for that partition. The final patch here was written by me and was done by taking many of the ideas I liked from the patches in the thread and redesigning other aspects. Discussion: https://postgr.es/m/OS0PR01MB571649B27E912EA6CC4EEF03942D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com Author: Amit Langote, Hou Zhijie, David Rowley Reviewed-by: Amit Langote, Hou Zhijie
2022-04-12adjust_partition_colnos mustn't be called if not neededAlvaro Herrera
Add an assert to make this very explicit, as well as a code comment. The former should silence Coverity complaining about this. Introduced by 7103ebb7aae8. Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqTTAOzXiYybab+1DQOb3ZUuK99=p_KD+yrRFhcDbd0jg@mail.gmail.com
2022-04-07Revert "Rewrite some RI code to avoid using SPI"Alvaro Herrera
This reverts commit 99392cdd78b788295e52b9f4942fa11992fd5ba9. We'd rather rewrite ri_triggers.c as a whole rather than piecemeal. Discussion: https://postgr.es/m/E1ncXX2-000mFt-Pe@gemulon.postgresql.org
2022-04-07Rewrite some RI code to avoid using SPIAlvaro Herrera
Modify the subroutines called by RI trigger functions that want to check if a given referenced value exists in the referenced relation to simply scan the foreign key constraint's unique index, instead of using SPI to execute SELECT 1 FROM referenced_relation WHERE ref_key = $1 This saves a lot of work, especially when inserting into or updating a referencing relation. This rewrite allows to fix a PK row visibility bug caused by a partition descriptor hack which requires ActiveSnapshot to be set to come up with the correct set of partitions for the RI query running under REPEATABLE READ isolation. We now set that snapshot indepedently of the snapshot to be used by the PK index scan, so the two no longer interfere. The buggy output in src/test/isolation/expected/fk-snapshot.out of the relevant test case added by commit 00cb86e75d6d has been corrected. (The bug still exists in branch 14, however, but this fix is too invasive to backpatch.) Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Li Japin <japinli@hotmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
2022-04-05Refactor and cleanup runtime partition prune code a littleAlvaro Herrera
* Move the execution pruning initialization steps that are common between both ExecInitAppend() and ExecInitMergeAppend() into a new function ExecInitPartitionPruning() defined in execPartition.c. Those steps include creation of a PartitionPruneState to be used for all instances of pruning and determining the minimal set of child subplans that need to be initialized by performing initial pruning if needed, and finally adjusting the subplan_map arrays in the PartitionPruneState to reflect the new set of subplans remaining after initial pruning if it was indeed performed. ExecCreatePartitionPruneState() is no longer exported out of execPartition.c and has been renamed to CreatePartitionPruneState() as a local sub-routine of ExecInitPartitionPruning(). * Likewise, ExecFindInitialMatchingSubPlans() that was in charge of performing initial pruning no longer needs to be exported. In fact, since it would now have the same body as the more generally named ExecFindMatchingSubPlans(), except differing in the value of initial_prune passed to the common subroutine find_matching_subplans_recurse(), it seems better to remove it and add an initial_prune argument to ExecFindMatchingSubPlans(). * Add an ExprContext field to PartitionPruneContext to remove the implicit assumption in the runtime pruning code that the ExprContext to use to compute pruning expressions that need one can always rely on the PlanState providing it. A future patch will allow runtime pruning (at least the initial pruning steps) to be performed without the corresponding PlanState yet having been created, so this will help. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
2022-03-28Add support for MERGE SQL commandAlvaro Herrera
MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-07-19Use l*_node() family of functions where appropriatePeter Eisentraut
Instead of castNode(…, lfoo(…)) Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/87eecahraj.fsf@wibble.ilmari.org
2021-05-12Initial pgindent and pgperltidy run for v14.Tom Lane
Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
2021-05-10Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.Tom Lane
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present. If it happens, the ON CONFLICT UPDATE code path would end up storing tuples that include the values of the extra resjunk columns. That's fairly harmless in the short run, but if new columns are added to the table then the values would become accessible, possibly leading to malfunctions if they don't match the datatypes of the new columns. This had escaped notice through a confluence of missing sanity checks, including * There's no cross-check that a tuple presented to heap_insert or heap_update matches the table rowtype. While it's difficult to check that fully at reasonable cost, we can easily add assertions that there aren't too many columns. * The output-column-assignment cases in execExprInterp.c lacked any sanity checks on the output column numbers, which seems like an oversight considering there are plenty of assertion checks on input column numbers. Add assertions there too. * We failed to apply nodeModifyTable's ExecCheckPlanOutput() to the ON CONFLICT UPDATE tlist. That wouldn't have caught this specific error, since that function is chartered to ignore resjunk columns; but it sure seems like a bad omission now that we've seen this bug. In HEAD, the right way to fix this is to make the processing of ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists now do, that is don't add "SET x = x" entries, and use ExecBuildUpdateProjection to evaluate the tlist and combine it with old values of the not-set columns. This adds a little complication to ExecBuildUpdateProjection, but allows removal of a comparable amount of now-dead code from the planner. In the back branches, the most expedient solution seems to be to (a) use an output slot for the ON CONFLICT UPDATE projection that actually matches the target table, and then (b) invent a variant of ExecBuildProjectionInfo that can be told to not store values resulting from resjunk columns, so it doesn't try to store into nonexistent columns of the output slot. (We can't simply ignore the resjunk columns altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.) This works back to v10. In 9.6, projections work much differently and we can't cheaply give them such an option. The 9.6 version of this patch works by inserting a JunkFilter when it's necessary to get rid of resjunk columns. In addition, v11 and up have the reverse problem when trying to perform ON CONFLICT UPDATE on a partitioned table. Through a further oversight, adjust_partition_tlist() discarded resjunk columns when re-ordering the ON CONFLICT UPDATE tlist to match a partition. This accidentally prevented the storing-bogus-tuples problem, but at the cost that MULTIEXPR_SUBLINK cases didn't work, typically crashing if more than one row has to be updated. Fix by preserving resjunk columns in that routine. (I failed to resist the temptation to add more assertions there too, and to do some minor code beautification.) Per report from Andres Freund. Back-patch to all supported branches. Security: CVE-2021-32028
2021-04-22Fix relcache inconsistency hazard in partition detachAlvaro Herrera
During queries coming from ri_triggers.c, we need to omit partitions that are marked pending detach -- otherwise, the RI query is tricked into allowing a row into the referencing table whose corresponding row is in the detached partition. Which is bogus: once the detach operation completes, the row becomes an orphan. However, the code was not doing that in repeatable-read transactions, because relcache kept a copy of the partition descriptor that included the partition, and used it in the RI query. This commit changes the partdesc cache code to only keep descriptors that aren't dependent on a snapshot (namely: those where no detached partition exist, and those where detached partitions are included). When a partdesc-without- detached-partitions is requested, we create one afresh each time; also, those partdescs are stored in PortalContext instead of CacheMemoryContext. find_inheritance_children gets a new output *detached_exist boolean, which indicates whether any partition marked pending-detach is found. Its "include_detached" input flag is changed to "omit_detached", because that name captures desired the semantics more naturally. CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are identically renamed. This was noticed because a buildfarm member that runs with relcache clobbering, which would not keep the improperly cached partdesc, broke one test, which led us to realize that the expected output of that test was bogus. This commit also corrects that expected output. Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-06Postpone some stuff out of ExecInitModifyTable.Tom Lane
Arrange to do some things on-demand, rather than immediately during executor startup, because there's a fair chance of never having to do them at all: * Don't open result relations' indexes until needed. * Don't initialize partition tuple routing, nor the child-to-root tuple conversion map, until needed. This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates; with larger partition counts the savings is quite noticeable. Also, we can remove some sketchy heuristics in ExecInitModifyTable about whether to set up tuple routing. Also, remove execPartition.c's private hash table tracking which partitions were already opened by the ModifyTable node. Instead use the hash added to ModifyTable itself by commit 86dc90056. To allow lazy computation of the conversion maps, we now set ri_RootResultRelInfo in all child ResultRelInfos. We formerly set it only in some, not terribly well-defined, cases. This has user-visible side effects in that now more error messages refer to the root relation instead of some partition (and provide error data in the root's column order, too). It looks to me like this is a strict improvement in consistency, so I don't have a problem with the output changes visible in this commit. Extracted from a larger patch, which seemed to me to be too messy to push in one commit. Amit Langote, reviewed at different times by Heikki Linnakangas and myself Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
2021-03-31Rework planning and execution of UPDATE and DELETE.Tom Lane
This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-25ALTER TABLE ... DETACH PARTITION ... CONCURRENTLYAlvaro Herrera
Allow a partition be detached from its partitioned table without blocking concurrent queries, by running in two transactions and only requiring ShareUpdateExclusive in the partitioned table. Because it runs in two transactions, it cannot be used in a transaction block. This is the main reason to use dedicated syntax: so that users can choose to use the original mode if they need it. But also, it doesn't work when a default partition exists (because an exclusive lock would still need to be obtained on it, in order to change its partition constraint.) In case the second transaction is cancelled or a crash occurs, there's ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final steps. The main trick to make this work is the addition of column pg_inherits.inhdetachpending, initially false; can only be set true in the first part of this command. Once that is committed, concurrent transactions that use a PartitionDirectory will include or ignore partitions so marked: in optimizer they are ignored if the row is marked committed for the snapshot; in executor they are always included. As a result, and because of the way PartitionDirectory caches partition descriptors, queries that were planned before the detach will see the rows in the detached partition and queries that are planned after the detach, won't. A CHECK constraint is created that duplicates the partition constraint. This is probably not strictly necessary, and some users will prefer to remove it afterwards, but if the partition is re-attached to a partitioned table, the constraint needn't be rechecked. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-02-18Fix tuple routing to initialize batching only for insertsTomas Vondra
A cross-partition update on a partitioned table is implemented as a delete followed by an insert. With foreign partitions, this was however causing issues, because the FDW and core may disagree on when to enable batching. postgres_fdw was only allowing batching for plain inserts (CMD_INSERT) while core was trying to batch the insert component of the cross-partition update. Fix by restricting core to apply batching only to plain CMD_INSERT queries. It's possible to allow batching for cross-partition updates, but that will require more extensive changes, so better to leave that for a separate patch. Author: Amit Langote Reviewed-by: Tomas Vondra, Takayuki Tsunakawa Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-02-08Fix permission checks on constraint violation errors on partitions.Heikki Linnakangas
If a cross-partition UPDATE violates a constraint on the target partition, and the columns in the new partition are in different physical order than in the parent, the error message can reveal columns that the user does not have SELECT permission on. A similar bug was fixed earlier in commit 804b6b6db4. The cause of the bug is that the callers of the ExecBuildSlotValueDescription() function got confused when constructing the list of modified columns. If the tuple was routed from a parent, we converted the tuple to the parent's format, but the list of modified columns was grabbed directly from the child's RTE entry. ExecUpdateLockMode() had a similar issue. That lead to confusion on which columns are key columns, leading to wrong tuple lock being taken on tables referenced by foreign keys, when a row is updated with INSERT ON CONFLICT UPDATE. A new isolation test is added for that corner case. With this patch, the ri_RangeTableIndex field is no longer set for partitions that don't have an entry in the range table. Previously, it was set to the RTE entry of the parent relation, but that was confusing. NOTE: This modifies the ResultRelInfo struct, replacing the ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to backpatch, because it breaks any extensions accessing the field. The change that ri_RangeTableIndex is not set for partitions could potentially break extensions, too. The ResultRelInfos are visible to FDWs at least, and this patch required small changes to postgres_fdw. Nevertheless, this seem like the least bad option. I don't think these fields widely used in extensions; I don't think there are FDWs out there that uses the FDW "direct update" API, other than postgres_fdw. If there is, you will get a compilation error, so hopefully it is caught quickly. Backpatch to 11, where support for both cross-partition UPDATEs, and unique indexes on partitioned tables, were added. Reviewed-by: Amit Langote Security: CVE-2021-3393
2021-01-28Fix hash partition pruning with asymmetric partition sets.Tom Lane
perform_pruning_combine_step() was not taught about the number of partition indexes used in hash partitioning; more embarrassingly, get_matching_hash_bounds() also had it wrong. These errors are masked in the common case where all the partitions have the same modulus and no partition is missing. However, with missing or unequal-size partitions, we could erroneously prune some partitions that need to be scanned, leading to silently wrong query answers. While a minimal-footprint fix for this could be to export get_partition_bound_num_indexes and make the incorrect functions use it, I'm of the opinion that that function should never have existed in the first place. It's not reasonable data structure design that PartitionBoundInfoData lacks any explicit record of the length of its indexes[] array. Perhaps that was all right when it could always be assumed equal to ndatums, but something should have been done about it as soon as that stopped being true. Putting in an explicit "nindexes" field makes both partition_bounds_equal() and partition_bounds_copy() simpler, safer, and faster than before, and removes explicit knowledge of the number-of-partition-indexes rules from some other places too. This change also makes get_hash_partition_greatest_modulus obsolete. I left that in place in case any external code uses it, but no core code does anymore. Per bug #16840 from Michał Albrycht. Back-patch to v11 where the hash partitioning code came in. (In the back branches, add the new field at the end of PartitionBoundInfoData to minimize ABI risks.) Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2021-01-20Implement support for bulk inserts in postgres_fdwTomas Vondra
Extends the FDW API to allow batching inserts into foreign tables. That is usually much more efficient than inserting individual rows, due to high latency for each round-trip to the foreign server. It was possible to implement something similar in the regular FDW API, but it was inconvenient and there were issues with reporting the number of actually inserted rows etc. This extends the FDW API with two new functions: * GetForeignModifyBatchSize - allows the FDW picking optimal batch size * ExecForeignBatchInsert - inserts a batch of rows at once Currently, only INSERT queries support batching. Support for DELETE and UPDATE may be added in the future. This also implements batching for postgres_fdw. The batch size may be specified using "batch_size" option both at the server and table level. The initial patch version was written by me, but it was rewritten and improved in many ways by Takayuki Tsunakawa. Author: Takayuki Tsunakawa Reviewed-by: Tomas Vondra, Amit Langote Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-12-15Improve hash_create()'s API for some added robustness.Tom Lane
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which was formerly the default; and add assertions insisting that exactly one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set. This is in hopes of preventing recurrences of the type of oversight fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS). Also, when HASH_STRINGS is specified, insist that the keysize be more than 8 bytes. This is a heuristic, but it should catch accidental use of HASH_STRINGS for integer or pointer keys. (Nearly all existing use-cases set the keysize to NAMEDATALEN or more, so there's little reason to think this restriction should be problematic.) Tweak hash_create() to insist that the HASH_ELEM flag be set, and remove the defaults it had for keysize and entrysize. Since those defaults were undocumented and basically useless, no callers omitted HASH_ELEM anyway. Also, remove memset's zeroing the HASHCTL parameter struct from those callers that had one. This has never been really necessary, and while it wasn't a bad coding convention it was confusing that some callers did it and some did not. We might as well save a few cycles by standardizing on "not". Also improve the documentation for hash_create(). In passing, improve reinit.c's usage of a hash table by storing the key as a binary Oid rather than a string; and, since that's a temporary hash table, allocate it in CurrentMemoryContext for neatness. Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-10-19Remove PartitionRoutingInfo struct.Heikki Linnakangas
The extra indirection neeeded to access its members via its enclosing ResultRelInfo seems pointless. Move all the fields from PartitionRoutingInfo to ResultRelInfo. Author: Amit Langote Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com
2020-10-19Revise child-to-root tuple conversion map management.Heikki Linnakangas
Store the tuple conversion map to convert a tuple from a child table's format to the root format in a new ri_ChildToRootMap field in ResultRelInfo. It is initialized if transition tuple capture for FOR STATEMENT triggers or INSERT tuple routing on a partitioned table is needed. Previously, ModifyTable kept the maps in the per-subplan ModifyTableState->mt_per_subplan_tupconv_maps array, or when tuple routing was used, in ResultRelInfo->ri_Partitioninfo->pi_PartitionToRootMap. The new field replaces both of those. Now that the child-to-root tuple conversion map is always available in ResultRelInfo (when needed), remove the TransitionCaptureState.tcs_map field. The callers of Exec*Trigger() functions no longer need to set or save it, which is much less confusing and bug-prone. Also, as a future optimization, this will allow us to delay creating the map for a given result relation until the relation is actually processed during execution. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqHtCWLdK-LO%3DNEsvOdHx%2B7yv4mE_zYK0i3BH7dXb-wxog%40mail.gmail.com
2020-09-16Don't fetch partition check expression during InitResultRelInfo.Tom Lane
Since there is only one place that actually needs the partition check expression, namely ExecPartitionCheck, it's better to fetch it from the relcache there. In this way we will never fetch it at all if the query never has use for it, and we still fetch it just once when we do need it. The reason for taking an interest in this is that if the relcache doesn't already have the check expression cached, fetching it requires obtaining AccessShareLock on the partition root. That means that operations that look like they should only touch the partition itself will also take a lock on the root. In particular we observed that TRUNCATE on a partition may take a lock on the partition's root, contributing to a deadlock situation in parallel pg_restore. As written, this patch does have a small cost, which is that we are microscopically reducing efficiency for the case where a partition has an empty check expression. ExecPartitionCheck will be called, and will go through the motions of setting up and checking an empty qual, where before it would not have been called at all. We could avoid that by adding a separate boolean flag to track whether there is a partition expression to test. However, this case only arises for a default partition with no siblings, which surely is not an interesting case in practice. Hence adding complexity for it does not seem like a good trade-off. Amit Langote, per a suggestion by me Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
2020-09-08Check default partitions constraints while descendingAlvaro Herrera
Partitioning tuple route code assumes that the partition chosen while descending the partition hierarchy is always the correct one. This is true except when the partition is the default partition and another partition has been added concurrently: the partition constraint changes and we don't recheck it. This can lead to tuples mistakenly being added to the default partition that should have been rejected. Fix by rechecking the default partition constraint while descending the hierarchy. An isolation test based on the reproduction steps described by Hao Wu (with tweaks for extra coverage) is included. Backpatch to 12, where this bug came in with 898e5e3290a7. Reported by: Hao Wu <hawu@vmware.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
2020-08-05Fix matching of sub-partitions when a partitioned plan is stale.Tom Lane
Since we no longer require AccessExclusiveLock to add a partition, the executor may see that a partitioned table has more partitions than the planner saw. ExecCreatePartitionPruneState's code for matching up the partition lists in such cases was faulty, and would misbehave if the planner had successfully pruned any partitions from the query. (Thus, trouble would occur only if a partition addition happens concurrently with a query that uses both static and dynamic partition pruning.) This led to an Assert failure in debug builds, and probably to crashes or query misbehavior in production builds. To repair the bug, just explicitly skip zeroes in the plan's relid_map[] list. I also made some cosmetic changes to make the code more readable (IMO anyway). Also, convert the cross-checking Assert to a regular test-and-elog, since it's now apparent that this logic is more fragile than one would like. Currently, there's no way to repeatably exercise this code, except with manual use of a debugger to stop the backend between planning and execution. Hence, no test case in this patch. We oughta do something about that testability gap, but that's for another day. Amit Langote and Tom Lane, per report from Justin Pryzby. Oversight in commit 898e5e329; backpatch to v12 where that appeared. Discussion: https://postgr.es/m/20200802181131.GA27754@telsasoft.com
2020-03-23Add object names to partition integrity violations.Amit Kapila
All errors of SQLSTATE class 23 should include the name of an object associated with the error in separate fields of the error report message. We do this so that applications need not try to extract them from the possibly-localized human-readable text of the message. Reported-by: Chris Bandy Author: Chris Bandy Reviewed-by: Amit Kapila and Amit Langote Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
2020-03-10Remove utils/acl.h from catalog/objectaddress.hPeter Eisentraut
The need for this was removed by 8b9e9644dc6a9bd4b7a97950e6212f63880cf18b. A number of files now need to include utils/acl.h or parser/parse_node.h explicitly where they previously got it indirectly somehow. Since parser/parse_node.h already includes nodes/parsenodes.h, the latter is then removed where the former was added. Also, remove nodes/pg_list.h from objectaddress.h, since that's included via nodes/parsenodes.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com