summaryrefslogtreecommitdiff
path: root/src/backend/executor/nodeAppend.c
AgeCommit message (Collapse)Author
2025-03-12Handle interrupts while waiting on Append's async subplansHeikki Linnakangas
We did not wake up on interrupts while waiting on async events on an async-capable append node. For example, if you tried to cancel the query, nothing would happen until one of the async subplans becomes readable. To fix, add WL_LATCH_SET to the WaitEventSet. Backpatch down to v14 where async Append execution was introduced. Discussion: https://www.postgresql.org/message-id/37a40570-f558-40d3-b5ea-5c2079b3b30b@iki.fi
2025-02-07Track unpruned relids to avoid processing pruned relationsAmit Langote
This commit introduces changes to track unpruned relations explicitly, making it possible for top-level plan nodes, such as ModifyTable and LockRows, to avoid processing partitions pruned during initial pruning. Scan-level nodes, such as Append and MergeAppend, already avoid the unnecessary processing by accessing partition pruning results directly via part_prune_index. In contrast, top-level nodes cannot access pruning results directly and need to determine which partitions remain unpruned. To address this, this commit introduces a new bitmapset field, es_unpruned_relids, which the executor uses to track the set of unpruned relations. This field is referenced during plan initialization to skip initializing certain nodes for pruned partitions. It is initialized with PlannedStmt.unprunableRelids, a new field that the planner populates with RT indexes of relations that cannot be pruned during runtime pruning. These include relations not subject to partition pruning and those required for execution regardless of pruning. PlannedStmt.unprunableRelids is computed during set_plan_refs() by removing the RT indexes of runtime-prunable relations, identified from PartitionPruneInfos, from the full set of relation RT indexes. ExecDoInitialPruning() then updates es_unpruned_relids by adding partitions that survive initial pruning. To support this, PartitionedRelPruneInfo and PartitionedRelPruningData now include a leafpart_rti_map[] array that maps partition indexes to their corresponding RT indexes. The former is used in set_plan_refs() when constructing unprunableRelids, while the latter is used in ExecDoInitialPruning() to convert partition indexes returned by get_matching_partitions() into RT indexes, which are then added to es_unpruned_relids. These changes make it possible for ModifyTable and LockRows nodes to process only relations that remain unpruned after initial pruning. ExecInitModifyTable() trims lists, such as resultRelations, withCheckOptionLists, returningLists, and updateColnosLists, to consider only unpruned partitions. It also creates ResultRelInfo structs only for these partitions. Similarly, child RowMarks for pruned relations are skipped. By avoiding unnecessary initialization of structures for pruned partitions, these changes improve the performance of updates and deletes on partitioned tables during initial runtime pruning. Due to ExecInitModifyTable() changes as described above, EXPLAIN on a plan for UPDATE and DELETE that uses runtime initial pruning no longer lists partitions pruned during initial pruning. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-01-31Perform runtime initial pruning outside ExecInitNode()Amit Langote
This commit builds on the prior change that moved PartitionPruneInfos out of individual plan nodes into a list in PlannedStmt, making it possible to initialize PartitionPruneStates without traversing the plan tree and perform runtime initial pruning before ExecInitNode() initializes the plan trees. These tasks are now handled in a new routine, ExecDoInitialPruning(), which is called by InitPlan() before calling ExecInitNode() on various plan trees. ExecDoInitialPruning() performs the initial pruning and saves the result -- a Bitmapset of indexes for surviving child subnodes -- in es_part_prune_results, a list in EState. PartitionPruneStates created for initial pruning are stored in es_part_prune_states, another list in EState, for later use during exec pruning. Both lists are parallel to es_part_prune_infos, which holds the PartitionPruneInfos from PlannedStmt, enabling shared indexing. PartitionPruneStates initialized in ExecDoInitialPruning() now include only the PartitionPruneContexts for initial pruning steps. Exec pruning contexts are initialized later in ExecInitPartitionExecPruning() when the parent plan node is initialized, as the exec pruning step expressions depend on the parent node's PlanState. The existing function PartitionPruneFixSubPlanMap() has been repurposed for this initialization to avoid duplicating a similar loop structure for finding PartitionedRelPruningData to initialize exec pruning contexts for. It has been renamed to InitExecPruningContexts() to reflect its new primary responsibility. The original logic to "fix subplan maps" remains intact but is now encapsulated within the renamed function. This commit removes two obsolete Asserts in partkey_datum_from_expr(). The ExprContext used for pruning expression evaluation is now independent of the parent PlanState, making these Asserts unnecessary. By centralizing pruning logic and decoupling it from the plan initialization step (ExecInitNode()), this change sets the stage for future patches that will use the result of initial pruning to save the overhead of redundant processing for pruned partitions. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-01-30Move PartitionPruneInfo out of plan nodes into PlannedStmtAmit Langote
This moves PartitionPruneInfo from plan nodes to PlannedStmt, simplifying traversal by centralizing all PartitionPruneInfo structures in a single list in it, which holds all instances for the main query and its subqueries. Instead of plan nodes (Append or MergeAppend) storing PartitionPruneInfo pointers, they now reference an index in this list. A bitmapset field is added to PartitionPruneInfo to store the RT indexes corresponding to the apprelids field in Append or MergeAppend. This allows execution pruning logic to verify that it operates on the correct plan node, mainly to facilitate debugging. Duplicated code in set_append_references() and set_mergeappend_references() is refactored into a new function, register_pruneinfo(). This updates RT indexes by applying rtoffet and adds PartitionPruneInfo to the global list in PlannerGlobal. By allowing pruning to be performed without traversing the plan tree, this change lays the groundwork for runtime initial pruning to occur independently of plan tree initialization. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-19Use ExecGetCommonSlotOps infrastructure in more places.Tom Lane
Append, MergeAppend, and RecursiveUnion can all use the support functions added in commit 276279295. The first two can report a fixed result slot type if all their children return the same fixed slot type. That does nothing for the append step itself, but might allow optimizations in the parent plan node. RecursiveUnion can optimize tuple hash table operations in the same way as SetOp now does. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-11-23Use ResourceOwner to track WaitEventSets.Heikki Linnakangas
A WaitEventSet holds file descriptors or event handles (on Windows). If FreeWaitEventSet is not called, those fds or handles are leaked. Use ResourceOwners to track WaitEventSets, to clean those up automatically on error. This was a live bug in async Append nodes, if a FDW's ForeignAsyncRequest function failed. (In back branches, I will apply a more localized fix for that based on PG_TRY-PG_FINALLY.) The added test doesn't check for leaking resources, so it passed even before this commit. But at least it covers the code path. In the passing, fix misleading comment on what the 'nevents' argument to WaitEventSetWait means. Report by Alexander Lakhin, analysis and suggestion for the fix by Tom Lane. Fixes bug #17828. Reviewed-by: Alexander Lakhin, Thomas Munro Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
2023-05-04Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"Alvaro Herrera
This reverts commit ec386948948c and its fixup 589bb816499e. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql
2023-03-02Mop up some undue familiarity with the innards of Bitmapsets.Tom Lane
nodeAppend.c used non-nullness of appendstate->as_valid_subplans as a state flag to indicate whether it'd done ExecFindMatchingSubPlans (or some sufficient approximation to that). This was pretty questionable even in the beginning, since it wouldn't really work right if there are no valid subplans. It got more questionable after commit 27e1f1456 added logic that could reduce as_valid_subplans to an empty set: at that point we were depending on unspecified behavior of bms_del_members, namely that it'd not return an empty set as NULL. It's about to start doing that, which breaks this logic entirely. Hence, add a separate boolean flag to signal whether as_valid_subplans has been computed. Also fix a previously-cosmetic bug in nodeAgg.c, wherein it ignored the return value of bms_del_member instead of updating its pointer. Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-01Move PartitioPruneInfo out of plan nodes into PlannedStmtAlvaro Herrera
The planner will now add a given PartitioPruneInfo to PlannedStmt.partPruneInfos instead of directly to the Append/MergeAppend plan node. What gets set instead in the latter is an index field which points to the list element of PlannedStmt.partPruneInfos containing the PartitioPruneInfo belonging to the plan node. A later commit will make AcquireExecutorLocks() do the initial partition pruning to determine a minimal set of partitions to be locked when validating a plan tree and it will need to consult the PartitioPruneInfos referenced therein to do so. It would be better for the PartitioPruneInfos to be accessible directly than requiring a walk of the plan tree to find them, which is easier when it can be done by simply iterating over PlannedStmt.partPruneInfos. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2022-04-05Refactor and cleanup runtime partition prune code a littleAlvaro Herrera
* Move the execution pruning initialization steps that are common between both ExecInitAppend() and ExecInitMergeAppend() into a new function ExecInitPartitionPruning() defined in execPartition.c. Those steps include creation of a PartitionPruneState to be used for all instances of pruning and determining the minimal set of child subplans that need to be initialized by performing initial pruning if needed, and finally adjusting the subplan_map arrays in the PartitionPruneState to reflect the new set of subplans remaining after initial pruning if it was indeed performed. ExecCreatePartitionPruneState() is no longer exported out of execPartition.c and has been renamed to CreatePartitionPruneState() as a local sub-routine of ExecInitPartitionPruning(). * Likewise, ExecFindInitialMatchingSubPlans() that was in charge of performing initial pruning no longer needs to be exported. In fact, since it would now have the same body as the more generally named ExecFindMatchingSubPlans(), except differing in the value of initial_prune passed to the common subroutine find_matching_subplans_recurse(), it seems better to remove it and add an initial_prune argument to ExecFindMatchingSubPlans(). * Add an ExprContext field to PartitionPruneContext to remove the implicit assumption in the runtime pruning code that the ExprContext to use to compute pruning expressions that need one can always rely on the PlanState providing it. A future patch will allow runtime pruning (at least the initial pruning steps) to be performed without the corresponding PlanState yet having been created, so this will help. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-08-02Fix oversight in commit 1ec7fca8592178281cd5cdada0f27a340fb813fc.Etsuro Fujita
I failed to account for the possibility that when ExecAppendAsyncEventWait() notifies multiple async-capable nodes using postgres_fdw, a preceding node might invoke process_pending_request() to process a pending asynchronous request made by a succeeding node. In that case the succeeding node should produce a tuple to return to the parent Append node from tuples fetched by process_pending_request() when notified. Repair. Per buildfarm via Michael Paquier. Back-patch to v14, like the previous commit. Thanks to Tom Lane for testing. Discussion: https://postgr.es/m/YQP0UPT8KmPiHTMs%40paquier.xyz
2021-07-30postgres_fdw: Fix handling of pending asynchronous requests.Etsuro Fujita
A pending asynchronous request is handled by process_pending_request(), which previously not only processed an in-progress remote query but performed ExecForeignScan() to produce a tuple to return to the local server asynchronously from the result of the remote query. But that led to a server crash when executing a query or led to an "InstrStartNode called twice in a row" or "InstrEndLoop called on running node" failure when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it contained multiple async-capable nodes accessing the same initplan/subplan that contained multiple async-capable nodes scanning the same foreign tables as for the parent async-capable nodes, as reported by Andrey Lepikhov. The reason is that the second step in process_pending_request() invoked when executing the initplan/subplan for one of the parent async-capable nodes caused recursive execution of the initplan/subplan for another of the parent async-capable nodes. To fix, split process_pending_request() into the two steps and postpone the second step until ForeignAsyncConfigureWait() is called for each of the pending asynchronous requests. Also, in ExecAppendAsyncEventWait() we assumed that FDWs would register at least one wait event in a WaitEventSet created there when they were called from ForeignAsyncConfigureWait() in that function, but allow FDWs to register zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait() to just return in that case. Oversight in commit 27e1f1456. Back-patch to v14 where that commit went in. Andrey Lepikhov and Etsuro Fujita Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru
2021-06-07Fix rescanning of async-aware Append nodes.Etsuro Fujita
In cases where run-time pruning isn't required, the synchronous and asynchronous subplans for an async-aware Append node determined using classify_matching_subplans() should be re-used when rescanning the node, but the previous code re-determined them using that function repeatedly each time when rescanning the node, leading to incorrect results in a normal build and an Assert failure in an Assert-enabled build as that function doesn't assume that it's called repeatedly in such cases. Fix the code as mentioned above. My oversight in commit 27e1f1456. While at it, initialize async-related pointers/variables to NULL/zero explicitly in ExecInitAppend() and ExecReScanAppend(), just to be sure. (The variables would have been set to zero before we get to the latter function, but let's do so.) Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAPmGK16Q4B2_KY%2BJH7rb7wQbw54AUprp7TMekGTd2T1B62yysQ%40mail.gmail.com
2021-05-12Initial pgindent and pgperltidy run for v14.Tom Lane
Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
2021-05-12Fix EXPLAIN ANALYZE for async-capable nodes.Etsuro Fujita
EXPLAIN ANALYZE for an async-capable ForeignScan node associated with postgres_fdw is done just by using instrumentation for ExecProcNode() called from the node's callbacks, causing the following problems: 1) If the remote table to scan is empty, the node is incorrectly considered as "never executed" by the command even if the node is executed, as ExecProcNode() isn't called from the node's callbacks at all in that case. 2) The command fails to collect timings for things other than ExecProcNode() done in the node, such as creating a cursor for the node's remote query. To fix these problems, add instrumentation for async-capable nodes, and modify postgres_fdw accordingly. My oversight in commit 27e1f1456. While at it, update a comment for the AsyncRequest struct in execnodes.h and the documentation for the ForeignAsyncRequest API in fdwhandler.sgml to match the code in ExecAsyncAppendResponse() in nodeAppend.c, and fix typos in comments in nodeAppend.c. Per report from Andrey Lepikhov, though I didn't use his patch. Reviewed-by: Andrey Lepikhov Discussion: https://postgr.es/m/2eb662bb-105d-fc20-7412-2f027cc3ca72%40postgrespro.ru
2021-04-23Minor code cleanup in asynchronous execution support.Etsuro Fujita
This is cleanup for commit 27e1f1456: * ExecAppendAsyncEventWait(), which was modified a bit further by commit a8af856d3, duplicated the same nevents calculation. Simplify the code a little bit to avoid the duplication. Update comments there. * Add an assertion to ExecAppendAsyncRequest(). * Update a comment about merging the async_capable options from input relations in merge_fdw_options(), per complaint from Kyotaro Horiguchi. * Add a comment for fetch_more_data_begin(). Author: Etsuro Fujita Discussion: https://postgr.es/m/CAPmGK1637W30Wx3MnrReewhafn6F_0J76mrJGoFXFnpPq4QfvA%40mail.gmail.com
2021-04-06Adjust input value to WaitEventSetWait() in ExecAppendAsyncEventWait().Etsuro Fujita
Adjust the number of events given to WaitEventSetWait() so that it doesn't exceed the maximum number of events in the WaitEventSet given to that function (set->nevents_space) in hopes of making the buildfarm green. Per valgrind failure report from Tom Lane and the buildfarm. Author: Etsuro Fujita Discussion: https://postgr.es/m/3411577.1617289776%40sss.pgh.pa.us
2021-03-31Add support for asynchronous execution.Etsuro Fujita
This implements asynchronous execution, which runs multiple parts of a non-parallel-aware Append concurrently rather than serially to improve performance when possible. Currently, the only node type that can be run concurrently is a ForeignScan that is an immediate child of such an Append. In the case where such ForeignScans access data on different remote servers, this would run those ForeignScans concurrently, and overlap the remote operations to be performed simultaneously, so it'll improve the performance especially when the operations involve time-consuming ones such as remote join and remote aggregation. We may extend this to other node types such as joins or aggregates over ForeignScans in the future. This also adds the support for postgres_fdw, which is enabled by the table-level/server-level option "async_capable". The default is false. Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself. This commit is mostly based on the patch proposed by Robert Haas, but also uses stuff from the patch proposed by Kyotaro Horiguchi and from the patch proposed by Thomas Munro. Reviewed by Kyotaro Horiguchi, Konstantin Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and others. Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-12-11Allow executor startup pruning to prune all child nodes.Tom Lane
Previously, if the startup pruning logic proved that all child nodes of an Append or MergeAppend could be pruned, we still kept one, just to keep EXPLAIN from failing. The previous commit removed the ruleutils.c limitation that required this kluge, so drop it. That results in less-confusing EXPLAIN output, as per a complaint from Yuzuko Hosoya. David Rowley Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-07-22Make better use of the new List implementation in a couple of placesDavid Rowley
In nodeAppend.c and nodeMergeAppend.c there were some foreach loops which looped over the list of subplans and only performed any work if the subplan index was found in a Bitmapset. With the old linked list implementation of List, this form made sense as accessing the Nth list element was O(N). However, thanks to 1cff1b95a we now have array-based lists, so accessing the Nth element has become O(1). Here we make the most of the O(1) lookups and just loop over the set members of the Bitmapset with bms_next_member(). This performs slightly better when a small number of the list items are in the Bitmapset. Micro benchmarks show that when the Bitmapset contains all or most of the list items then the new code is ever so slightly slower. In practice, the cost is so small that it's drowned out by various other things such as locking the relations belonging to each subplan, etc. The primary goal here is to leave better code examples around which benefit better from the new list implementation. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAKJS1f8ZcsLVgkF4wOfRyMYTcPgLFiUAOedFC+U2vK_aFZk-BA@mail.gmail.com
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-11-15Introduce notion of different types of slots (without implementing them).Andres Freund
Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-09Don't require return slots for nodes without projection.Andres Freund
In a lot of nodes the return slot is not required. That can either be because the node doesn't do any projection (say an Append node), or because the node does perform projections but the projection is optimized away because the projection would yield an identical row. Slots aren't that small, especially for wide rows, so it's worthwhile to avoid creating them. It's not possible to just skip creating the slot - it's currently used to determine the tuple descriptor returned by ExecGetResultType(). So separate the determination of the result type from the slot creation. The work previously done internally ExecInitResultTupleSlotTL() can now also be done separately with ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that aren't guaranteed to need a result slot, can use ExecInitResultTypeTL() to determine the result type of the node, and ExecAssignScanProjectionInfo() (via ExecConditionalAssignProjectionInfo()) determines that a result slot is needed, it is created with ExecInitResultSlot(). Besides the advantage of avoiding to create slots that then are unused, this is necessary preparation for later patches around tuple table slot abstraction. In particular separating the return descriptor and slot is a prerequisite to allow JITing of tuple deforming with knowledge of the underlying tuple format, and to avoid unnecessarily creating JITed tuple deforming for virtual slots. This commit removes a redundant argument from ExecInitResultTupleSlotTL(). While this commit touches a lot of the relevant lines anyway, it'd normally still not worthwhile to cause breakage, except that aforementioned later commits will touch *all* ExecInitResultTupleSlotTL() callers anyway (but fits worse thematically). Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-10-06Remove more redundant relation locking during executor startup.Tom Lane
We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04Centralize executor's opening/closing of Relations for rangetable entries.Tom Lane
Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-08-01Fix run-time partition pruning for appends with multiple source rels.Tom Lane
The previous coding here supposed that if run-time partitioning applied to a particular Append/MergeAppend plan, then all child plans of that node must be members of a single partitioning hierarchy. This is totally wrong, since an Append could be formed from a UNION ALL: we could have multiple hierarchies sharing the same Append, or child plans that aren't part of any hierarchy. To fix, restructure the related plan-time and execution-time data structures so that we can have a separate list or array for each partitioning hierarchy. Also track subplans that are not part of any hierarchy, and make sure they don't get pruned. Per reports from Phil Florent and others. Back-patch to v11, since the bug originated there. David Rowley, with a lot of cosmetic adjustments by me; thanks also to Amit Langote for review. Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com
2018-07-30Verify range bounds to bms_add_range when necessaryAlvaro Herrera
Now that the bms_add_range boundary protections are gone, some alternative ones are needed in a few places. Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/3437ccf8-a144-55ff-1e2f-fc16b437823b@lab.ntt.co.jp
2018-07-19Fix comment.Heikki Linnakangas
This comment was copy-pasted from nodeAppend.c to nodeMergeAppend.c, but while committing 5220bb7533, I modified wrong copy of it. Spotted by David Rowley
2018-07-19Expand run-time partition pruning to work with MergeAppendHeikki Linnakangas
This expands the support for the run-time partition pruning which was added for Append in 499be013de to also allow unneeded subnodes of a MergeAppend to be removed. Author: David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com
2018-06-13Fix up run-time partition pruning's use of relcache's partition data.Tom Lane
The previous coding saved pointers into the partitioned table's relcache entry, but then closed the relcache entry, causing those pointers to nominally become dangling. Actual trouble would be seen in the field only if a relcache flush occurred mid-query, but that's hardly out of the question. While we could fix this by copying all the data in question at query start, it seems better to just hold the relcache entry open for the whole query. While at it, improve the handling of support-function lookups: do that once per query not once per pruning test. There's still something to be desired here, in that we fail to exploit the possibility of caching data across queries in the fn_extra fields of the relcache's FmgrInfo structs, which could happen if we just used those structs in-place rather than copying them. However, combining that with the possibility of per-query lookups of cross-type comparison functions seems to require changes in the APIs of a lot of the pruning support functions, so it's too invasive to consider as part of this patch. A win would ensue only for complex partition key data types (e.g. arrays), so it may not be worth the trouble. David Rowley and Tom Lane Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
2018-06-10Improve run-time partition pruning to handle any stable expression.Tom Lane
The initial coding of the run-time-pruning feature only coped with cases where the partition key(s) are compared to Params. That is a bit silly; we can allow it to work with any non-Var-containing stable expression, as long as we take special care with expressions containing PARAM_EXEC Params. The code is hardly any longer this way, and it's considerably clearer (IMO at least). Per gripe from Pavel Stehule. David Rowley, whacked around a bit by me Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-04-17Update Append's idea of first_partial_planAlvaro Herrera
It turns out that after runtime partition pruning, Append's first_partial_plan does not accurately represent partial plans to run, if any of those got pruned. This could limit participation of workers in some partial subplans, if other subplans got pruned. Fix it by keeping an index of the first valid partial subplan in the state node, determined at execnode Init time. Author: David Rowley, with cosmetic changes by me. Discussion: https://postgr.es/m/CAKJS1f8o2Yd=rOP=Et3A0FWgF+gSAOkFSU6eNhnGzTPV7nN8sQ@mail.gmail.com
2018-04-09Fix incorrect logic for choosing the next Parallel Append subplanAlvaro Herrera
In 499be013de support for pruning unneeded Append subnodes was added. The logic in that commit was not correctly checking if the next subplan was in fact a valid subplan. This could cause parallel workers processes to be given a subplan to work on which didn't require any work. Per code review following an otherwise unexplained regression failure in buildfarm member Pademelon. (We haven't been able to reproduce the failure, so this is a bit of a blind fix in terms of whether it'll actually fix it; but it is a clear bug nonetheless). In passing, also add a comment to explain what first_partial_plan means. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f_E5r05hHUVG3UmCQJ49DGKKHtN=SHybD44LdzBn+CJng@mail.gmail.com
2018-04-07Support partition pruning at execution timeAlvaro Herrera
Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-02-28Fix assertion failure when Parallel Append is run serially.Robert Haas
Parallel-aware plan nodes must be prepared to run without parallelism if it's not possible at execution time for whatever reason. Commit ab72716778128fb63d54ac256adf7fe6820a1185, which introduced Parallel Append, overlooked this. Rajkumar Raghuwanshi reported this problem, and I included his test case in this patch. The code changes are by me. Discussion: http://postgr.es/m/CAKcux6=WqkUudLg1GLZZ7fc5ScWC1+Y9qD=pAHeqy32WoeJQvw@mail.gmail.com
2018-02-16Allow tupleslots to have a fixed tupledesc, use in executor nodes.Andres Freund
The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-08Fix possible infinite loop with Parallel Append.Robert Haas
When the previously-chosen plan was non-partial, all pa_finished flags for partial plans are now set, and pa_next_plan has not yet been set to INVALID_SUBPLAN_INDEX, the previous code could go into an infinite loop. Report by Rajkumar Raghuwanshi. Patch by Amit Khandekar and me. Review by Kyotaro Horiguchi. Discussion: http://postgr.es/m/CAJ3gD9cf43z78qY=U=H0HvOEN341qfRO-vLpnKPSviHeWgJQ5w@mail.gmail.com
2018-01-04Code review for Parallel Append.Robert Haas
- Remove unnecessary #include mistakenly added in execnodes.h. - Fix mistake in comment in choose_next_subplan_for_leader. - Adjust row estimates in cost_append for a possibly-different parallel divisor. - Clamp row estimates in cost_append after operations that may not produce integers. Amit Kapila, with cosmetic adjustments by me. Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-12-06Fix Parallel Append crash.Robert Haas
Reported by Tom Lane and the buildfarm. Amul Sul and Amit Khandekar Discussion: http://postgr.es/m/17868.1512519318@sss.pgh.pa.us Discussion: http://postgr.es/m/CAJ3gD9cJQ4d-XhmZ6BqM9rMM2KDBfpkdgOAb4+psz56uBuMQ_A@mail.gmail.com
2017-12-05Support Parallel Append plan nodes.Robert Haas
When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-11-08Change TRUE/FALSE to true/falsePeter Eisentraut
The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-07-30Move ExecProcNode from dispatch to function pointer based model.Andres Freund
This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com