summaryrefslogtreecommitdiff
path: root/src/backend/executor
AgeCommit message (Collapse)Author
2024-08-07Refactor/reword some error messages to avoid duplicatesAlvaro Herrera
Also, remove brackets around "EMPTY [ ARRAY ]". An error message is not the place to state that a keyword is optional. Backpatch to 17.
2024-07-31Evaluate arguments of correlated SubPlans in the referencing ExprStateAndres Freund
Until now we generated an ExprState for each parameter to a SubPlan and evaluated them one-by-one ExecScanSubPlan. That's sub-optimal as creating lots of small ExprStates a) makes JIT compilation more expensive b) wastes memory c) is a bit slower to execute This commit arranges to evaluate parameters to a SubPlan as part of the ExprState referencing a SubPlan, using the new EEOP_PARAM_SET expression step. We emit one EEOP_PARAM_SET for each argument to a subplan, just before the EEOP_SUBPLAN step. It likely is worth using EEOP_PARAM_SET in other places as well, e.g. for SubPlan outputs, nestloop parameters and - more ambitiously - to get rid of ExprContext->domainValue/caseValue/ecxt_agg*. But that's for later. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Discussion: https://postgr.es/m/20230225214401.346ancgjqc3zmvek@awork3.anarazel.de
2024-07-30SQL/JSON: Fix casting for integer EXISTS columns in JSON_TABLEAmit Langote
The current method of coercing the boolean result value of JsonPathExists() to the target type specified for an EXISTS column, which is to call the type's input function via json_populate_type(), leads to an error when the target type is integer, because the integer input function doesn't recognize boolean literal values as valid. Instead use the boolean-to-integer cast function for coercion in that case so that using integer or domains thereof as type for EXISTS columns works. Note that coercion for ON ERROR values TRUE and FALSE already works like that because the parser creates a cast expression including the cast function, but the coercion of the actual result value is not handled by the parser. Tests by Jian He. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
2024-07-26SQL/JSON: Remove useless code in ExecInitJsonExpr()Amit Langote
The code was for adding an unconditional JUMP to the next step, which is unnecessary processing. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
2024-07-26SQL/JSON: Improve error-handling of JsonBehavior expressionsAmit Langote
Instead of returning a NULL when the JsonBehavior expression value could not be coerced to the RETURNING type, throw the error message informing the user that it is the JsonBehavior expression that caused the error with the actual coercion error message shown in its DETAIL line. Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
2024-07-26SQL/JSON: Fix error-handling of some JsonBehavior expressionsAmit Langote
To ensure that the errors of executing a JsonBehavior expression that is coerced in the parser are caught instead of being thrown directly, pass ErrorSaveContext to ExecInitExprRec() when initializing it. Also, add a EEOP_JSONEXPR_COERCION_FINISH step to handle the errors that are caught that way. Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
2024-07-19Propagate query IDs of utility statements in functionsMichael Paquier
For utility statements defined within a function, the query tree is copied to a PlannedStmt as utility commands do not require planning. However, the query ID was missing from the information passed down. This leads to plugins relying on the query ID like pg_stat_statements to not be able to track utility statements within function calls. Tests are added to check this behavior, depending on pg_stat_statements.track. This is an old bug. Now, query IDs for utilities are compiled using their parsed trees rather than the query string since v16 (3db72ebcbe20), leading to less bloat with utilities, so backpatch down only to this version. Author: Anthonin Bonnefoy Discussion: https://postgr.es/m/CAO6_XqrGp-uwBqi3vBPLuRULKkddjC7R5QZCgsFren=8E+m2Sg@mail.gmail.com Backpatch-through: 16
2024-07-13Fix new assertion for MERGE view_name ... DO NOTHING.Noah Misch
Such queries don't expand automatically updatable views, and ModifyTable uses the wholerow attribute unconditionally. The user-visible behavior is fine, so change to more-specific assertions. Commit d5f788b41dc2cbdde6e7694c70dda54d829a5ed5 added the wrong assertion. Back-patch to v17, where commit 5f2e179bd31e5f5803005101eb12a8d7bf8db8f3 introduced MERGE view_name. Reported by Alexander Lakhin. Discussion: https://postgr.es/m/e4b40a88-c134-6926-3196-bc4501cb87a2@gmail.com
2024-07-09Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZEDavid Rowley
Nodes like Memoize report the cache stats for each parallel worker, so it makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan in a similar way. Likewise, Sort shows the method and memory used for each worker. There was some discussion on whether the leader stats should include the totals for each parallel worker or not. I did some analysis on this to see what other parallel node types do and it seems only Parallel Hash does anything like this. All the rest, per what's supported by ExecParallelRetrieveInstrumentation() are consistent with each other. Author: David Geier <geidav.pg@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Donghang Lin <donghanglin@gmail.com> Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Christofides <michael@pgmustard.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Donghang Lin <donghanglin@gmail.com> Reviewed-by: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com> Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com
2024-07-08Fix right-anti-joins when the inner relation is proven uniqueRichard Guo
For an inner_unique join, we always assume that the executor will stop scanning for matches after the first match. Therefore, for a mergejoin that is inner_unique and whose mergeclauses are sufficient to identify a match, we set the skip_mark_restore flag to true, indicating that the executor need not do mark/restore calls. However, merge-right-anti-join did not get this memo and continues scanning the inner side for matches after the first match. If there are duplicates in the outer scan, we may incorrectly skip matching some inner tuples, which can lead to wrong results. Here we fix this issue by ensuring that merge-right-anti-join also advances to next outer tuple after the first match in inner_unique cases. This also saves cycles by avoiding unnecessary scanning of inner tuples after the first match. Although hash-right-anti-join does not suffer from this wrong results issue, we apply the same change to it as well, to help save cycles for the same reason. Per bug #18522 from Antti Lampinen, and bug #18526 from Feliphe Pozzer. Back-patch to v16 where right-anti-join was introduced. Author: Richard Guo Discussion: https://postgr.es/m/18522-c7a8956126afdfd0@postgresql.org
2024-07-05Support "Right Semi Join" plan shapesRichard Guo
Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
2024-07-02Use TupleDescAttr macro consistentlyDavid Rowley
A few places were directly accessing the attrs[] array. This goes against the standards set by 2cd708452. Fix that. Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-06-28SQL/JSON: Always coerce JsonExpr result at runtimeAmit Langote
Instead of looking up casts at parse time for converting the result of JsonPath* query functions to the specified or the default RETURNING type, always perform the conversion at runtime using either the target type's input function or the function json_populate_type(). There are two motivations for this change: 1. json_populate_type() coerces to types with typmod such that any string values that exceed length limit cause an error instead of silent truncation, which is necessary to be standard-conforming. 2. It was possible to end up with a cast expression that doesn't support soft handling of errors causing bugs in the of handling ON ERROR clause. JsonExpr.coercion_expr which would store the cast expression is no longer necessary, so remove. Bump catversion because stored rules change because of the above removal. Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: Discussion: https://postgr.es/m/202405271326.5a5rprki64aw%40alvherre.pgsql
2024-06-27Expand comments and add an assertion in nodeModifyTable.c.Noah Misch
Most comments concern RELKIND_VIEW. One addresses the ExecUpdate() "tupleid" parameter. A later commit will rely on these facts, but they hold already. Back-patch to v12 (all supported versions), the plan for that commit. Reviewed (in an earlier version) by Robert Haas. Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
2024-06-27Fix thinkos in commentsAlvaro Herrera
The first one was noticed by Tender Wang and introduced with 8aba9322511f; the other one was newly introduced with dbca3469ebf8.
2024-06-26Fix partition pruning setup during DETACH CONCURRENTLYAlvaro Herrera
When detaching partition in concurrent mode, it's possible for partition descriptors to not match the set that was recently seen when the plan was made, causing an assertion failure or (in production builds) failure to construct a working plan. The case that was reported involves prepared statements, but I think it may be possible to hit this bug without that too. The problem is that CreatePartitionPruneState is constructing a PartitionPruneState under the assumption that new partitions can be added, but never removed, but it turns out that this isn't true: a prepared statement gets replanned when the DETACH CONCURRENTLY session sends out its invalidation message, but if the invalidation message arrives after ExecInitAppend started, we would build a partition descriptor without the partition, and then CreatePartitionPruneState would refuse to work with it. CreatePartitionPruneState already contains code to deal with the new descriptor having more partitions than before (and behaving for the extra partitions as if they had been pruned), but doesn't have code to deal with less partitions than before, and it is naïve about the case where the number of partitions is the same. We could simply add that a new stanza for less partitions than before, and in simple testing it works to do that; but it's possible to press the test scripts even further and hit the case where one partition is added and a partition is removed quickly enough that we see the same number of partitions, but they don't actually match, causing hangs during execution. To cope with both these problems, we now memcmp() the arrays of partition OIDs, and do a more elaborate mapping (relying on the fact that both OID arrays are in partition-bounds order) if they're not identical. This fix was already pushed in backbranches earlier. Reported-by: yajun Hu <1026592243@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
2024-06-24Revert "Fix partition pruning setup during DETACH CONCURRENTLY"Alvaro Herrera
This reverts commit 27162a64b386; this branch is in code freeze due to a nearing release. We can commit again after the release is out. Discussion: https://postgr.es/m/1158256.1719239648@sss.pgh.pa.us
2024-06-24Fix partition pruning setup during DETACH CONCURRENTLYAlvaro Herrera
When detaching partition in concurrent mode, it's possible for partition descriptors to not match the set that was recently seen when the plan was made, causing an assertion failure or (in production builds) failure to construct a working plan. The case that was reported involves prepared statements, but I think it may be possible to hit this bug without that too. The problem is that CreatePartitionPruneState is constructing a PartitionPruneState under the assumption that new partitions can be added, but never removed, but it turns out that this isn't true: a prepared statement gets replanned when the DETACH CONCURRENTLY session sends out its invalidation message, but if the invalidation message arrives after ExecInitAppend started, we would build a partition descriptor without the partition, and then CreatePartitionPruneState would refuse to work with it. CreatePartitionPruneState already contains code to deal with the new descriptor having more partitions than before (and behaving for the extra partitions as if they had been pruned), but doesn't have code to deal with less partitions than before, and it is naïve about the case where the number of partitions is the same. We could simply add that a new stanza for less partitions than before, and in simple testing it works to do that; but it's possible to press the test scripts even further and hit the case where one partition is added and a partition is removed quickly enough that we see the same number of partitions, but they don't actually match, causing hangs during execution. To cope with both these problems, we now memcmp() the arrays of partition OIDs, and do a more elaborate mapping (relying on the fact that both OID arrays are in partition-bounds order) if they're not identical. Backpatch to 14, where DETACH CONCURRENTLY appeared. Reported-by: yajun Hu <1026592243@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
2024-06-19SQL/JSON: Correct jsonpath variable name matchingAmit Langote
Previously, GetJsonPathVar() allowed a jsonpath expression to reference any prefix of a PASSING variable's name. For example, the following query would incorrectly work: SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz); The fix ensures that the length of the variable name mentioned in a jsonpath expression matches exactly with the name of the PASSING variable before comparing the strings using strncmp(). Reported-by: Alvaro Herrera (off-list) Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com
2024-06-07Fix behavior of stable functions called from a CALL's argument list.Tom Lane
If the CALL is within an atomic context (e.g. there's an outer transaction block), _SPI_execute_plan should acquire a fresh snapshot to execute any such functions with. We failed to do that and instead passed them the Portal snapshot, which had been acquired at the start of the current SQL command. This'd lead to seeing stale values of rows modified since the start of the command. This is arguably a bug in 84f5c2908: I failed to see that "are we in non-atomic mode" needs to be defined the same way as it is further down in _SPI_execute_plan, i.e. check !_SPI_current->atomic not just options->allow_nonatomic. Alternatively the blame could be laid on plpgsql, which is unconditionally passing allow_nonatomic = true for CALL/DO even when it knows it's in an atomic context. However, fixing it in spi.c seems like a better idea since that will also fix the problem for any extensions that may have copied plpgsql's coding pattern. While here, update an obsolete comment about _SPI_execute_plan's snapshot management. Per report from Victor Yegorov. Back-patch to all supported versions. Discussion: https://postgr.es/m/CAGnEboiRe+fG2QxuBO2390F7P8e2MQ6UyBjZSL_w1Cej+E4=Vw@mail.gmail.com
2024-05-16Revert temporal primary keys and foreign keysPeter Eisentraut
This feature set did not handle empty ranges correctly, and it's now too late for PostgreSQL 17 to fix it. The following commits are reverted: 6db4598fcb8 Add stratnum GiST support function 46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints 86232a49a43 Fix comment on gist_stratnum_btree 030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange. 5577a71fb0c Use half-open interval notation in without_overlaps tests 34768ee3616 Add temporal FOREIGN KEY contraints 482e108cd38 Add test for REPLICA IDENTITY with a temporal key c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE 144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
2024-05-10Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexesPeter Eisentraut
A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST index, not a B-Tree, but it will still have indisunique set. The code for ON CONFLICT fails if it sees a non-btree index that has indisunique. This commit fixes that and adds some tests. But now that we can't just test indisunique, we also need some extra checks to prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint (because the conflict could happen against more than one row, and we'd only update one). Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/1426589a-83cb-4a89-bf40-713970c07e63@illuminatedcomputing.com
2024-05-04Fix an assortment of typosDavid Rowley
Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
2024-05-01Ensure we allocate NAMEDATALEN bytes for names in Index Only ScansDavid Rowley
As an optimization, we store "name" columns as cstrings in btree indexes. Here we modify it so that Index Only Scans convert these cstrings back to names with NAMEDATALEN bytes rather than storing the cstring in the tuple slot, as was happening previously. Bug: #17855 Reported-by: Alexander Lakhin Reviewed-by: Alexander Lakhin, Tom Lane Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org Backpatch-through: 12, all supported versions
2024-04-23Remove some unnecessary fields from executor nodes.Tom Lane
JsonExprState.input_finfo is only assigned to, never read, and it's really fairly useless since the value can be gotten out of the adjacent input_fcinfo field. Let's remove it before someone starts to depend on it. While here, also remove TidScanState.tss_htup and AggState.combinedproj, which are referenced nowhere. Those should have been removed by the commits that caused them to become disused, but were not. I don't think a catversion bump is necessary here, since plan trees are never stored on disk. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WjsY4d0TBymLNGK4zpttUcg_YZaTjyWz2VfDUV6YH8wXQ@mail.gmail.com
2024-04-18Fix typos and duplicate wordsDaniel Gustafsson
This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18SQL/JSON: Improve some error messagesAmit Langote
This improves some error messages emitted by SQL/JSON query functions by mentioning column name when available, such as when they are invoked as part of evaluating JSON_TABLE() columns. To do so, a new field column_name is added to both JsonFuncExpr and JsonExpr that is only populated when creating those nodes for transformed JSON_TABLE() columns. While at it, relevant error messages are reworded for clarity. Reported-by: Jian He <jian.universality@gmail.com> Suggested-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
2024-04-11Revert: Allow table AM tuple_insert() method to return the different slotAlexander Korotkov
This commit reverts c35a3fb5e0 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11Revert: Allow locking updated tuples in tuple_update() and tuple_delete()Alexander Korotkov
This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11Revert: Let table AM insertion methods control index insertionAlexander Korotkov
This commit reverts b1484a3f19 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11Revert indexed and enlargable binary heap implementation.Masahiko Sawada
This reverts commit b840508644 and bcb14f4abc. These commits were made for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions). However, per discussion, commit efb8acc0d0 replaced binary heap + index with pairing heap, and made these commits unnecessary. Reported-by: Jeff Davis Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-07Change BitmapAdjustPrefetchIterator to accept BlockNumberTomas Vondra
BitmapAdjustPrefetchIterator() only used the blockno member of the passed in TBMIterateResult to ensure that the prefetch iterator and regular iterator stay in sync. Pass it the BlockNumber only, so that we can move away from using the TBMIterateResult outside of table AM specific code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-07BitmapHeapScan: Use correct recheck flag for skip_fetchTomas Vondra
As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for the current block to determine whether or not it should skip prefetching the proposed prefetch block. As explained in the comment, this assumed the index AM will report the same recheck value for the future page as it did for the current page - but there's no guarantee. This only affects prefetching - if the recheck flag changes, we may prefetch blocks unecessarily and not prefetch blocks that will be needed. But we don't need to rely on that assumption - we know the recheck flag for the block we're considering prefetching, so we can use that. The impact is very limited in practice - the opclass would need to assign different recheck flags to different blocks, but none of the built-in opclasses seems to do that. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
2024-04-07BitmapHeapScan: Push skip_fetch optimization into table AMTomas Vondra
Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-06BitmapHeapScan: postpone setting can_skip_fetchTomas Vondra
Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the skip fetch optimization into heap AM code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-06BitmapHeapScan: begin scan after bitmap creationTomas Vondra
Change the order so that the table scan is initialized only after initializing the index scan and building the bitmap. This is mostly a cosmetic change for now, but later commits will need to pass parameters to table_beginscan_bm() that are unavailable in ExecInitBitmapHeapScan(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-06Enhance nbtree ScalarArrayOp execution.Peter Geoghegan
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit 807a40c5 taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit a4523c5a, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
2024-04-04Add basic JSON_TABLE() functionalityAmit Langote
JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in 6185c9737cf4 are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-04-03Add functions to binaryheap for efficient key removal and update.Masahiko Sawada
Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-03-30Let table AM insertion methods control index insertionAlexander Korotkov
Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
2024-03-30Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.Dean Rasheed
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE actions, which operate on rows that exist in the target relation, but not in the data source. These actions can execute UPDATE, DELETE, or DO NOTHING sub-commands. This is in contrast to already-supported WHEN NOT MATCHED actions, which operate on rows that exist in the data source, but not in the target relation. To make this distinction clearer, such actions may now be written as WHEN NOT MATCHED BY TARGET. Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is equivalent to writing WHEN NOT MATCHED BY TARGET. Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing. Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
2024-03-27Fix unnecessary use of moving-aggregate mode with non-moving frame.Tom Lane
When a plain aggregate is used as a window function, and the window frame start is specified as UNBOUNDED PRECEDING, the frame's head cannot move so we do not need to use moving-aggregate mode. The check for that was put into initialize_peragg(), failing to notice that ExecInitWindowAgg() calls that function before it's filled in winstate->frameOptions. Since makeNode() would have zeroed the field, this didn't provoke uninitialized-value complaints, nor would the erroneous decision have resulted in more than a little inefficiency. Still, it's wrong, so move the initialization of winstate->frameOptions earlier to make it work properly. While here, also fix a thinko in a comment. Both errors crept in in commit a9d9acbf2 which introduced the moving-aggregate mode. Spotted by Vallimaharajan G. Back-patch to all supported branches. Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
2024-03-26Improve error message for tts_(virtual|minimal)_is_current_xact_tupleAlexander Korotkov
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com Author: Pavel Borisov
2024-03-26Add comments on some MinimalTupleSlots methods usageAlexander Korotkov
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com Author: Pavel Borisov
2024-03-26Allow locking updated tuples in tuple_update() and tuple_delete()Alexander Korotkov
Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by the concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This commit eliminates step 2 by taking the lock during the first tuple_update()/tuple_delete() call. The heap table access method saves some effort by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. Also, this commit makes tuple_update()/tuple_delete() optionally save the old tuple into the dedicated slot. That saves efforts on re-fetching tuples in certain cases. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers
2024-03-21Add TupleTableSlotOps.is_current_xact_tuple() methodAlexander Korotkov
This allows us to abstract how/whether table AM uses transaction identifiers. A custom table AM can use a custom slot, which may not store xmin directly, but determine the tuple belonging to the current transaction in the other way. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21Allow table AM tuple_insert() method to return the different slotAlexander Korotkov
This allows table AM to return a native tuple slot even if VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge about system attributes, which could be accessed in the future. table_multi_insert() method already can modify the input 'slots' array. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21Add SQL/JSON query functionsAmit Langote
This introduces the following SQL/JSON functions for querying JSON data using jsonpath expressions: JSON_EXISTS(), which can be used to apply a jsonpath expression to a JSON value to check if it yields any values. JSON_QUERY(), which can be used to to apply a jsonpath expression to a JSON value to get a JSON object, an array, or a string. There are various options to control whether multi-value result uses array wrappers and whether the singleton scalar strings are quoted or not. JSON_VALUE(), which can be used to apply a jsonpath expression to a JSON value to return a single scalar value, producing an error if it multiple values are matched. Both JSON_VALUE() and JSON_QUERY() functions have options for handling EMPTY and ERROR conditions, which can be used to specify the behavior when no values are matched and when an error occurs during jsonpath evaluation, respectively. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov, Nikita Malakhov, Peter Eisentraut, Tomas Vondra Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-03-18Fix EXPLAIN Bitmap heap scan to count pages with no visible tuplesHeikki Linnakangas
Previously, bitmap heap scans only counted lossy and exact pages for explain when there was at least one visible tuple on the page. heapam_scan_bitmap_next_block() returned true only if there was a "valid" page with tuples to be processed. However, the lossy and exact page counters in EXPLAIN should count the number of pages represented in a lossy or non-lossy way in the constructed bitmap, regardless of whether or not the pages ultimately contained visible tuples. Backpatch to all supported versions. Author: Melanie Plageman Discussion: https://www.postgresql.org/message-id/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ%2BopUrXDRXdcfwFZGA@mail.gmail.com
2024-03-17Add RETURNING support to MERGE.Dean Rasheed
This allows a RETURNING clause to be appended to a MERGE query, to return values based on each row inserted, updated, or deleted. As with plain INSERT, UPDATE, and DELETE commands, the returned values are based on the new contents of the target table for INSERT and UPDATE actions, and on its old contents for DELETE actions. Values from the source relation may also be returned. As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be used as the source relation for other operations such as WITH queries and COPY commands. Additionally, a special function merge_action() is provided, which returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action executed for each row. The merge_action() function can be used anywhere in the RETURNING list, including in arbitrary expressions and subqueries, but it is an error to use it anywhere outside of a MERGE query's RETURNING list. Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera, Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut, and Wolfgang Walther. Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com