Age | Commit message (Collapse) | Author |
|
Also, remove brackets around "EMPTY [ ARRAY ]". An error message is
not the place to state that a keyword is optional.
Backpatch to 17.
|
|
Until now we generated an ExprState for each parameter to a SubPlan and
evaluated them one-by-one ExecScanSubPlan. That's sub-optimal as creating lots
of small ExprStates
a) makes JIT compilation more expensive
b) wastes memory
c) is a bit slower to execute
This commit arranges to evaluate parameters to a SubPlan as part of the
ExprState referencing a SubPlan, using the new EEOP_PARAM_SET expression
step. We emit one EEOP_PARAM_SET for each argument to a subplan, just before
the EEOP_SUBPLAN step.
It likely is worth using EEOP_PARAM_SET in other places as well, e.g. for
SubPlan outputs, nestloop parameters and - more ambitiously - to get rid of
ExprContext->domainValue/caseValue/ecxt_agg*. But that's for later.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Discussion: https://postgr.es/m/20230225214401.346ancgjqc3zmvek@awork3.anarazel.de
|
|
The current method of coercing the boolean result value of
JsonPathExists() to the target type specified for an EXISTS column,
which is to call the type's input function via json_populate_type(),
leads to an error when the target type is integer, because the
integer input function doesn't recognize boolean literal values as
valid.
Instead use the boolean-to-integer cast function for coercion in that
case so that using integer or domains thereof as type for EXISTS
columns works. Note that coercion for ON ERROR values TRUE and FALSE
already works like that because the parser creates a cast expression
including the cast function, but the coercion of the actual result
value is not handled by the parser.
Tests by Jian He.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
The code was for adding an unconditional JUMP to the next step,
which is unnecessary processing.
Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
Instead of returning a NULL when the JsonBehavior expression value
could not be coerced to the RETURNING type, throw the error message
informing the user that it is the JsonBehavior expression that caused
the error with the actual coercion error message shown in its DETAIL
line.
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
To ensure that the errors of executing a JsonBehavior expression that
is coerced in the parser are caught instead of being thrown directly,
pass ErrorSaveContext to ExecInitExprRec() when initializing it.
Also, add a EEOP_JSONEXPR_COERCION_FINISH step to handle the errors
that are caught that way.
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
For utility statements defined within a function, the query tree is
copied to a PlannedStmt as utility commands do not require planning.
However, the query ID was missing from the information passed down.
This leads to plugins relying on the query ID like pg_stat_statements to
not be able to track utility statements within function calls. Tests
are added to check this behavior, depending on pg_stat_statements.track.
This is an old bug. Now, query IDs for utilities are compiled using
their parsed trees rather than the query string since v16
(3db72ebcbe20), leading to less bloat with utilities, so backpatch down
only to this version.
Author: Anthonin Bonnefoy
Discussion: https://postgr.es/m/CAO6_XqrGp-uwBqi3vBPLuRULKkddjC7R5QZCgsFren=8E+m2Sg@mail.gmail.com
Backpatch-through: 16
|
|
Such queries don't expand automatically updatable views, and ModifyTable
uses the wholerow attribute unconditionally. The user-visible behavior
is fine, so change to more-specific assertions. Commit
d5f788b41dc2cbdde6e7694c70dda54d829a5ed5 added the wrong assertion.
Back-patch to v17, where commit 5f2e179bd31e5f5803005101eb12a8d7bf8db8f3
introduced MERGE view_name.
Reported by Alexander Lakhin.
Discussion: https://postgr.es/m/e4b40a88-c134-6926-3196-bc4501cb87a2@gmail.com
|
|
Nodes like Memoize report the cache stats for each parallel worker, so it
makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan
in a similar way. Likewise, Sort shows the method and memory used for
each worker.
There was some discussion on whether the leader stats should include the
totals for each parallel worker or not. I did some analysis on this to
see what other parallel node types do and it seems only Parallel Hash does
anything like this. All the rest, per what's supported by
ExecParallelRetrieveInstrumentation() are consistent with each other.
Author: David Geier <geidav.pg@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Donghang Lin <donghanglin@gmail.com>
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Christofides <michael@pgmustard.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Donghang Lin <donghanglin@gmail.com>
Reviewed-by: Masahiro Ikeda <Masahiro.Ikeda@nttdata.com>
Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com
|
|
For an inner_unique join, we always assume that the executor will stop
scanning for matches after the first match. Therefore, for a mergejoin
that is inner_unique and whose mergeclauses are sufficient to identify a
match, we set the skip_mark_restore flag to true, indicating that the
executor need not do mark/restore calls. However, merge-right-anti-join
did not get this memo and continues scanning the inner side for matches
after the first match. If there are duplicates in the outer scan, we
may incorrectly skip matching some inner tuples, which can lead to wrong
results.
Here we fix this issue by ensuring that merge-right-anti-join also
advances to next outer tuple after the first match in inner_unique
cases. This also saves cycles by avoiding unnecessary scanning of inner
tuples after the first match.
Although hash-right-anti-join does not suffer from this wrong results
issue, we apply the same change to it as well, to help save cycles for
the same reason.
Per bug #18522 from Antti Lampinen, and bug #18526 from Feliphe Pozzer.
Back-patch to v16 where right-anti-join was introduced.
Author: Richard Guo
Discussion: https://postgr.es/m/18522-c7a8956126afdfd0@postgresql.org
|
|
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.
Merge join could likely support "Right Semi Join" too. However, the
benefit of swapping inputs tends to be small here, so we do not address
that in this patch.
Note that this patch also modifies a test query in join.sql to ensure it
continues testing as intended. With this patch the original query would
result in a right-semi-join rather than semi-join, compromising its
original purpose of testing the fix for neqjoinsel's behavior for
semi-joins.
Author: Richard Guo
Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li
Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
|
|
A few places were directly accessing the attrs[] array. This goes
against the standards set by 2cd708452. Fix that.
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
|
|
Instead of looking up casts at parse time for converting the result
of JsonPath* query functions to the specified or the default
RETURNING type, always perform the conversion at runtime using either
the target type's input function or the function
json_populate_type().
There are two motivations for this change:
1. json_populate_type() coerces to types with typmod such that any
string values that exceed length limit cause an error instead of
silent truncation, which is necessary to be standard-conforming.
2. It was possible to end up with a cast expression that doesn't
support soft handling of errors causing bugs in the of handling
ON ERROR clause.
JsonExpr.coercion_expr which would store the cast expression is no
longer necessary, so remove.
Bump catversion because stored rules change because of the above
removal.
Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: Discussion: https://postgr.es/m/202405271326.5a5rprki64aw%40alvherre.pgsql
|
|
Most comments concern RELKIND_VIEW. One addresses the ExecUpdate()
"tupleid" parameter. A later commit will rely on these facts, but they
hold already. Back-patch to v12 (all supported versions), the plan for
that commit.
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
|
|
The first one was noticed by Tender Wang and introduced with
8aba9322511f; the other one was newly introduced with dbca3469ebf8.
|
|
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
This fix was already pushed in backbranches earlier.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
|
|
This reverts commit 27162a64b386; this branch is in code freeze due to a
nearing release. We can commit again after the release is out.
Discussion: https://postgr.es/m/1158256.1719239648@sss.pgh.pa.us
|
|
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
Backpatch to 14, where DETACH CONCURRENTLY appeared.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
|
|
Previously, GetJsonPathVar() allowed a jsonpath expression to
reference any prefix of a PASSING variable's name. For example, the
following query would incorrectly work:
SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz);
The fix ensures that the length of the variable name mentioned in a
jsonpath expression matches exactly with the name of the PASSING
variable before comparing the strings using strncmp().
Reported-by: Alvaro Herrera (off-list)
Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com
|
|
If the CALL is within an atomic context (e.g. there's an outer
transaction block), _SPI_execute_plan should acquire a fresh snapshot
to execute any such functions with. We failed to do that and instead
passed them the Portal snapshot, which had been acquired at the start
of the current SQL command. This'd lead to seeing stale values of
rows modified since the start of the command.
This is arguably a bug in 84f5c2908: I failed to see that "are we in
non-atomic mode" needs to be defined the same way as it is further
down in _SPI_execute_plan, i.e. check !_SPI_current->atomic not just
options->allow_nonatomic. Alternatively the blame could be laid on
plpgsql, which is unconditionally passing allow_nonatomic = true
for CALL/DO even when it knows it's in an atomic context. However,
fixing it in spi.c seems like a better idea since that will also fix
the problem for any extensions that may have copied plpgsql's coding
pattern.
While here, update an obsolete comment about _SPI_execute_plan's
snapshot management.
Per report from Victor Yegorov. Back-patch to all supported versions.
Discussion: https://postgr.es/m/CAGnEboiRe+fG2QxuBO2390F7P8e2MQ6UyBjZSL_w1Cej+E4=Vw@mail.gmail.com
|
|
This feature set did not handle empty ranges correctly, and it's now
too late for PostgreSQL 17 to fix it.
The following commits are reverted:
6db4598fcb8 Add stratnum GiST support function
46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints
86232a49a43 Fix comment on gist_stratnum_btree
030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod
a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange.
5577a71fb0c Use half-open interval notation in without_overlaps tests
34768ee3616 Add temporal FOREIGN KEY contraints
482e108cd38 Add test for REPLICA IDENTITY with a temporal key
c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
|
|
A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a
GiST index, not a B-Tree, but it will still have indisunique set. The
code for ON CONFLICT fails if it sees a non-btree index that has
indisunique. This commit fixes that and adds some tests. But now
that we can't just test indisunique, we also need some extra checks to
prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint
(because the conflict could happen against more than one row, and we'd
only update one).
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/1426589a-83cb-4a89-bf40-713970c07e63@illuminatedcomputing.com
|
|
Author: Alexander Lakhin
Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
|
|
As an optimization, we store "name" columns as cstrings in btree
indexes.
Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.
Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
|
|
JsonExprState.input_finfo is only assigned to, never read, and
it's really fairly useless since the value can be gotten out of
the adjacent input_fcinfo field. Let's remove it before someone
starts to depend on it.
While here, also remove TidScanState.tss_htup and AggState.combinedproj,
which are referenced nowhere. Those should have been removed by the
commits that caused them to become disused, but were not.
I don't think a catversion bump is necessary here, since plan trees
are never stored on disk.
Matthias van de Meent
Discussion: https://postgr.es/m/CAEze2WjsY4d0TBymLNGK4zpttUcg_YZaTjyWz2VfDUV6YH8wXQ@mail.gmail.com
|
|
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
|
|
This improves some error messages emitted by SQL/JSON query functions
by mentioning column name when available, such as when they are
invoked as part of evaluating JSON_TABLE() columns. To do so, a new
field column_name is added to both JsonFuncExpr and JsonExpr that is
only populated when creating those nodes for transformed JSON_TABLE()
columns.
While at it, relevant error messages are reworded for clarity.
Reported-by: Jian He <jian.universality@gmail.com>
Suggested-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
|
|
This commit reverts c35a3fb5e0 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This commit reverts b1484a3f19 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.
Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
|
|
BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.
This only affects prefetching - if the recheck flag changes, we may
prefetch blocks unecessarily and not prefetch blocks that will be
needed. But we don't need to rely on that assumption - we know the
recheck flag for the block we're considering prefetching, so we can
use that.
The impact is very limited in practice - the opclass would need to
assign different recheck flags to different blocks, but none of the
built-in opclasses seems to do that.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
|
|
Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.
With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.
This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.
Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of
in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the
skip fetch optimization into heap AM code.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Change the order so that the table scan is initialized only after
initializing the index scan and building the bitmap.
This is mostly a cosmetic change for now, but later commits will need
to pass parameters to table_beginscan_bm() that are unavailable in
ExecInitBitmapHeapScan().
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively. This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).
Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient. All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.
The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value. Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan. Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.
Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.
Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute. These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).
Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes. They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem). They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).
There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions. Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
|
|
JSON_TABLE() allows JSON data to be converted into a relational view
and thus used, for example, in a FROM clause, like other tabular
data. Data to show in the view is selected from a source JSON object
using a JSON path expression to get a sequence of JSON objects that's
called a "row pattern", which becomes the source to compute the
SQL/JSON values that populate the view's output columns. Column
values themselves are computed using JSON path expressions applied to
each of the JSON objects comprising the "row pattern", for which the
SQL/JSON query functions added in 6185c9737cf4 are used.
To implement JSON_TABLE() as a table function, this augments the
TableFunc and TableFuncScanState nodes that are currently used to
support XMLTABLE() with some JSON_TABLE()-specific fields.
Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN
clauses, which are required to provide more flexibility to extract
data out of nested JSON objects, but they are not implemented here
to keep this commit of manageable size.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
|
|
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).
This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.
The current code does not use the new indexing logic, but it will be
used by an upcoming patch.
Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
|
|
Previously, the executor did index insert unconditionally after calling
table AM interface methods tuple_insert() and multi_insert(). This commit
introduces the new parameter insert_indexes for these two methods. Setting
'*insert_indexes' to true saves the current logic. Setting it to false
indicates that table AM cares about index inserts itself and doesn't want the
caller to do that.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
|
|
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.
This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.
Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.
Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.
Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
|
|
When a plain aggregate is used as a window function, and the window
frame start is specified as UNBOUNDED PRECEDING, the frame's head
cannot move so we do not need to use moving-aggregate mode. The check
for that was put into initialize_peragg(), failing to notice that
ExecInitWindowAgg() calls that function before it's filled in
winstate->frameOptions. Since makeNode() would have zeroed the field,
this didn't provoke uninitialized-value complaints, nor would the
erroneous decision have resulted in more than a little inefficiency.
Still, it's wrong, so move the initialization of
winstate->frameOptions earlier to make it work properly.
While here, also fix a thinko in a comment. Both errors crept in in
commit a9d9acbf2 which introduced the moving-aggregate mode.
Spotted by Vallimaharajan G. Back-patch to all supported branches.
Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
|
|
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov
|
|
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov
|
|
Currently, in read committed transaction isolation mode (default), we have the
following sequence of actions when tuple_update()/tuple_delete() finds
the tuple updated by the concurrent transaction.
1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which
returns TM_Updated.
2. Lock tuple with tuple_lock().
3. Re-evaluate plan qual (recheck if we still need to update/delete and
calculate the new tuple for update).
4. Second attempt to update/delete tuple with tuple_update()/tuple_delete().
This attempt should be successful, since the tuple was previously locked.
This commit eliminates step 2 by taking the lock during the first
tuple_update()/tuple_delete() call. The heap table access method saves some
effort by checking the updated tuple once instead of twice. Future
undo-based table access methods, which will start from the latest row version,
can immediately place a lock there.
Also, this commit makes tuple_update()/tuple_delete() optionally save the old
tuple into the dedicated slot. That saves efforts on re-fetching tuples in
certain cases.
The code in nodeModifyTable.c is simplified by removing the nested switch/case.
Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
Reviewed-by: Andres Freund, Chris Travers
|
|
This allows us to abstract how/whether table AM uses transaction identifiers.
A custom table AM can use a custom slot, which may not store xmin directly,
but determine the tuple belonging to the current transaction in the other way.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
|
|
This allows table AM to return a native tuple slot even if
VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge
about system attributes, which could be accessed in the future.
table_multi_insert() method already can modify the input 'slots' array.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
|
|
This introduces the following SQL/JSON functions for querying JSON
data using jsonpath expressions:
JSON_EXISTS(), which can be used to apply a jsonpath expression to a
JSON value to check if it yields any values.
JSON_QUERY(), which can be used to to apply a jsonpath expression to
a JSON value to get a JSON object, an array, or a string. There are
various options to control whether multi-value result uses array
wrappers and whether the singleton scalar strings are quoted or not.
JSON_VALUE(), which can be used to apply a jsonpath expression to a
JSON value to return a single scalar value, producing an error if it
multiple values are matched.
Both JSON_VALUE() and JSON_QUERY() functions have options for
handling EMPTY and ERROR conditions, which can be used to specify
the behavior when no values are matched and when an error occurs
during jsonpath evaluation, respectively.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov,
Nikita Malakhov, Peter Eisentraut, Tomas Vondra
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
|
|
Previously, bitmap heap scans only counted lossy and exact pages for
explain when there was at least one visible tuple on the page.
heapam_scan_bitmap_next_block() returned true only if there was a
"valid" page with tuples to be processed. However, the lossy and exact
page counters in EXPLAIN should count the number of pages represented
in a lossy or non-lossy way in the constructed bitmap, regardless of
whether or not the pages ultimately contained visible tuples.
Backpatch to all supported versions.
Author: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ%2BopUrXDRXdcfwFZGA@mail.gmail.com
|
|
This allows a RETURNING clause to be appended to a MERGE query, to
return values based on each row inserted, updated, or deleted. As with
plain INSERT, UPDATE, and DELETE commands, the returned values are
based on the new contents of the target table for INSERT and UPDATE
actions, and on its old contents for DELETE actions. Values from the
source relation may also be returned.
As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be
used as the source relation for other operations such as WITH queries
and COPY commands.
Additionally, a special function merge_action() is provided, which
returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action
executed for each row. The merge_action() function can be used
anywhere in the RETURNING list, including in arbitrary expressions and
subqueries, but it is an error to use it anywhere outside of a MERGE
query's RETURNING list.
Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera,
Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut,
and Wolfgang Walther.
Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
|