Age | Commit message (Collapse) | Author |
|
If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value. The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.
This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.
There doesn't seem to be any harm done here other than the Assert
failure. Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.
The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter. This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type. This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.
Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Backpatch-through: 13, all supported versions
|
|
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
|
|
When short-circuiting WindowAgg node evaluation on the top-level
WindowAgg node using quals on monotonic window functions, because the
WindowAgg run condition can mean there's no need to evaluate subsequent
window function results in the same partition once the run condition
becomes false, it was possible that the executor would use stale results
from the previous invocation of the window function in some cases.
A fix for this was partially done by a5832722, but that commit only
fixed the issue for non-top-level WindowAgg nodes. I mistakenly thought
that the top-level WindowAgg didn't have this issue, but Jayesh's example
case clearly shows that's incorrect. At the time, I also thought that
this only affected 32-bit systems as all window functions which then
supported run conditions returned BIGINT, however, that's wrong as
ExecProject is still called and that could cause evaluation of any other
window function belonging to the same WindowAgg node, one of which may
return a byref type.
The only queries affected by this are WindowAggs with a "Run Condition"
which contains at least one window function with a byref result type,
such as lead() or lag() on a byref column. The window clause must also
contain a PARTITION BY clause (without a PARTITION BY, execution of the
WindowAgg stops immediately when the run condition becomes false and
there's no risk of using the stale results).
Reported-by: Jayesh Dehankar
Discussion: https://postgr.es/m/193261e2c4d.3dd3cd7c1842.871636075166132237@zohocorp.com
Backpatch-through: 15, where WindowAgg run conditions were added
|
|
If passed a read-write expanded object pointer, the EEOP_NULLIF
code would hand that same pointer to the equality function
and then (unless equality was reported) also return the same
pointer as its value. This is no good, because a function that
receives a read-write expanded object pointer is fully entitled
to scribble on or even delete the object, thus corrupting the
NULLIF output. (This problem is likely unobservable with the
equality functions provided in core Postgres, but it's easy to
demonstrate with one coded in plpgsql.)
To fix, make sure the pointer passed to the equality function
is read-only. We can still return the original read-write
pointer as the NULLIF result, allowing optimization of later
operations.
Per bug #18722 from Alexander Lakhin. This has been wrong
since we invented expanded objects, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org
|
|
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it. This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.
Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
|
|
Commit ac04aa84a put the shutoff for this into the planner, which is
not ideal because it doesn't prevent us from re-using a previously
made parallel plan. Revert the planner change and instead put the
shutoff into InitializeParallelDSM, modeling it on the existing code
there for recovering from failure to allocate a DSM segment.
However, that code path is mostly untested, and testing a bit harder
showed there's at least one bug: ExecHashJoinReInitializeDSM is not
prepared for us to have skipped doing parallel DSM setup. I also
thought the Assert in ReinitializeParallelWorkers is pretty
ill-advised, and replaced it with a silent Min() operation.
The existing test case added by ac04aa84a serves fine to test this
version of the fix, so no change needed there.
Patch by me, but thanks to Noah Misch for the core idea that we
could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED.
Back-patch to v12, as ac04aa84a was.
Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
|
|
Relations opened by the executor are only closed once in
ExecCloseRangeTableRelations(), so the word "again" in the comment
for ExecGetRangeTableRelation() is misleading and unnecessary.
Discussion: https://postgr.es/m/CA+HiwqHnw-zR+u060i3jp4ky5UR0CjByRFQz50oZ05de7wUg=Q@mail.gmail.com
Backpatch-through: 12
|
|
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect. While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance, when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.
Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit. Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing(). This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now process raw_expr directly in eval_const_expressions_mutator().
Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.
Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
|
|
After repartitioning the inner side of a hash join that would have
exceeded the allowed size, we check if all the tuples from a parent
partition moved to one child partition. That is evidence that it
contains duplicate keys and later attempts to repartition will also
fail, so we should give up trying to limit memory (for lack of a better
fallback strategy).
A thinko prevented the check from working correctly in partition 0 (the
one that is partially loaded into memory already). After
repartitioning, we should check for extreme skew if the *parent*
partition's space_exhausted flag was set, not the child partition's.
The consequence was repeated futile repartitioning until per-partition
data exceeded various limits including "ERROR: invalid DSA memory alloc
request size 1811939328", OS allocation failure, or temporary disk space
errors. (We could also do something about some of those symptoms, but
that's material for separate patches.)
This problem only became likely when PostgreSQL 16 introduced support
for Parallel Hash Right/Full Join, allowing NULL keys into the hash
table. Repartitioning always leaves NULL in partition 0, no matter how
many times you do it, because the hash value is all zero bits. That's
unlikely for other hashed values, but they might still have caused
wasted extra effort before giving up.
Back-patch to all supported releases.
Reported-by: Craig Milhiser <craig@milhiser.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com
|
|
Commit 2dc1deaea turns out to have been still a brick shy of a load,
because CALL statements executing within a plpgsql exception block
could still pass the wrong snapshot to stable functions within the
CALL's argument list. That happened because standard_ProcessUtility
forces isAtomicContext to true if IsTransactionBlock is true, which
it always will be inside a subtransaction. Then ExecuteCallStmt
would think it does not need to push a new snapshot --- but
_SPI_execute_plan didn't do so either, since it thought it was in
nonatomic mode.
The best fix for this seems to be for _SPI_execute_plan to operate
in atomic execution mode if IsSubTransaction() is true, even when the
SPI context as a whole is non-atomic. This makes _SPI_execute_plan
have the same rules about when non-atomic execution is allowed as
_SPI_commit/_SPI_rollback have about when COMMIT/ROLLBACK are allowed,
which seems appropriately symmetric. (If anyone ever tries to allow
COMMIT/ROLLBACK inside a subtransaction, this would all need to be
rethought ... but I'm unconvinced that such a thing could be logically
consistent at all.)
For further consistency, also check IsSubTransaction() in
SPI_inside_nonatomic_context. That does not matter for its
one present-day caller StartTransaction, which can't be reached
inside a subtransaction. But if any other callers ever arise,
they'd presumably want this definition.
Per bug #18656 from Alexander Alehin. Back-patch to all
supported branches, like previous fixes in this area.
Discussion: https://postgr.es/m/18656-cade1780866ef66c@postgresql.org
|
|
The previous commit fixed some ways of losing an inplace update. It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update(). This includes MERGE commands. To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update. The
v14+ UPDATE fix needs commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35,
and it wasn't worth reimplementing that fix without such infrastructure.
Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.
Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
|
|
This commit adds query ID reports for two code paths when processing
extended query protocol messages:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.
An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough (neither is an
addition in ExecutorRun(), actually, for the second point):
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message. This corresponds to the case where
execute_is_fetch is set, for example.
Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it. Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter as the first
report takes priority except if the report is forced. The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.
The test added in pg_stat_statements is a courtesy of Robert Haas. This
uses psql's \bind metacommand, hence this part is backpatched down to
v16.
Reported-by: Kaido Vaikla, Erik Wienhold
Author: Sami Imseih
Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org
Backpatch-through: 14
|
|
When the ON ERROR / ON EMPTY behavior is to return NULL, returning
NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's
no need to create separate steps to check the error/empty flag or
those to evaluate the the constant NULL expression. This speeds up
common cases because the default ON ERROR / ON EMPTY behavior for
JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps
are necessary if the RETURNING type is a domain, as constraints on the
domain may need to be checked.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
Reverts c88ce386c4d, 5067c230b8e, and e4e27976a68, because a few BF
animals didn't like one or all of them.
|
|
When the ON ERROR / ON EMPTY behavior is to return NULL, returning
NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's
no need to create separate steps to check the error/empty flag or
those to evaluate the the constant NULL expression. This speeds up
common cases because the default ON ERROR / ON EMPTY behavior for
JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps
are necessary if the RETURNING type is a domain, as constraints on the
domain may need to be checked.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
If the plancache entry for the CALL statement is already stale,
it's possible for us to fetch an old procedure OID out of it,
and then fail with "cache lookup failed for function NNN".
In ordinary usage this never happens because make_callstmt_target
is called just once immediately after building the plancache
entry. It can be forced however by setting up an erroneous CALL
(that causes make_callstmt_target itself to report an error),
then dropping/recreating the target procedure, then repeating
the erroneous CALL.
To fix, use SPI_plan_get_cached_plan() to fetch the plancache's
plan, rather than assuming we can use SPI_plan_get_plan_sources().
This shouldn't add any noticeable overhead in the normal case,
and in the stale-plan case we'd have had to replan anyway a little
further down.
The other callers of SPI_plan_get_plan_sources() seem OK, because
either they don't need up-to-date plans or they know that the
query was just (re) planned. But add some commentary in hopes
of not falling into this trap again.
Per bug #18574 from Song Hongyu. Back-patch to v14 where this coding
was introduced. (Older branches have comparable code, but it's run
after any required replanning, so there's no issue.)
Discussion: https://postgr.es/m/18574-2ce7ba3249221389@postgresql.org
|
|
Also, remove brackets around "EMPTY [ ARRAY ]". An error message is
not the place to state that a keyword is optional.
Backpatch to 17.
|
|
The current method of coercing the boolean result value of
JsonPathExists() to the target type specified for an EXISTS column,
which is to call the type's input function via json_populate_type(),
leads to an error when the target type is integer, because the
integer input function doesn't recognize boolean literal values as
valid.
Instead use the boolean-to-integer cast function for coercion in that
case so that using integer or domains thereof as type for EXISTS
columns works. Note that coercion for ON ERROR values TRUE and FALSE
already works like that because the parser creates a cast expression
including the cast function, but the coercion of the actual result
value is not handled by the parser.
Tests by Jian He.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
The code was for adding an unconditional JUMP to the next step,
which is unnecessary processing.
Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
Instead of returning a NULL when the JsonBehavior expression value
could not be coerced to the RETURNING type, throw the error message
informing the user that it is the JsonBehavior expression that caused
the error with the actual coercion error message shown in its DETAIL
line.
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
To ensure that the errors of executing a JsonBehavior expression that
is coerced in the parser are caught instead of being thrown directly,
pass ErrorSaveContext to ExecInitExprRec() when initializing it.
Also, add a EEOP_JSONEXPR_COERCION_FINISH step to handle the errors
that are caught that way.
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
|
|
For utility statements defined within a function, the query tree is
copied to a PlannedStmt as utility commands do not require planning.
However, the query ID was missing from the information passed down.
This leads to plugins relying on the query ID like pg_stat_statements to
not be able to track utility statements within function calls. Tests
are added to check this behavior, depending on pg_stat_statements.track.
This is an old bug. Now, query IDs for utilities are compiled using
their parsed trees rather than the query string since v16
(3db72ebcbe20), leading to less bloat with utilities, so backpatch down
only to this version.
Author: Anthonin Bonnefoy
Discussion: https://postgr.es/m/CAO6_XqrGp-uwBqi3vBPLuRULKkddjC7R5QZCgsFren=8E+m2Sg@mail.gmail.com
Backpatch-through: 16
|
|
Such queries don't expand automatically updatable views, and ModifyTable
uses the wholerow attribute unconditionally. The user-visible behavior
is fine, so change to more-specific assertions. Commit
d5f788b41dc2cbdde6e7694c70dda54d829a5ed5 added the wrong assertion.
Back-patch to v17, where commit 5f2e179bd31e5f5803005101eb12a8d7bf8db8f3
introduced MERGE view_name.
Reported by Alexander Lakhin.
Discussion: https://postgr.es/m/e4b40a88-c134-6926-3196-bc4501cb87a2@gmail.com
|
|
For an inner_unique join, we always assume that the executor will stop
scanning for matches after the first match. Therefore, for a mergejoin
that is inner_unique and whose mergeclauses are sufficient to identify a
match, we set the skip_mark_restore flag to true, indicating that the
executor need not do mark/restore calls. However, merge-right-anti-join
did not get this memo and continues scanning the inner side for matches
after the first match. If there are duplicates in the outer scan, we
may incorrectly skip matching some inner tuples, which can lead to wrong
results.
Here we fix this issue by ensuring that merge-right-anti-join also
advances to next outer tuple after the first match in inner_unique
cases. This also saves cycles by avoiding unnecessary scanning of inner
tuples after the first match.
Although hash-right-anti-join does not suffer from this wrong results
issue, we apply the same change to it as well, to help save cycles for
the same reason.
Per bug #18522 from Antti Lampinen, and bug #18526 from Feliphe Pozzer.
Back-patch to v16 where right-anti-join was introduced.
Author: Richard Guo
Discussion: https://postgr.es/m/18522-c7a8956126afdfd0@postgresql.org
|
|
Instead of looking up casts at parse time for converting the result
of JsonPath* query functions to the specified or the default
RETURNING type, always perform the conversion at runtime using either
the target type's input function or the function
json_populate_type().
There are two motivations for this change:
1. json_populate_type() coerces to types with typmod such that any
string values that exceed length limit cause an error instead of
silent truncation, which is necessary to be standard-conforming.
2. It was possible to end up with a cast expression that doesn't
support soft handling of errors causing bugs in the of handling
ON ERROR clause.
JsonExpr.coercion_expr which would store the cast expression is no
longer necessary, so remove.
Bump catversion because stored rules change because of the above
removal.
Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: Discussion: https://postgr.es/m/202405271326.5a5rprki64aw%40alvherre.pgsql
|
|
Most comments concern RELKIND_VIEW. One addresses the ExecUpdate()
"tupleid" parameter. A later commit will rely on these facts, but they
hold already. Back-patch to v12 (all supported versions), the plan for
that commit.
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
|
|
The first one was noticed by Tender Wang and introduced with
8aba9322511f; the other one was newly introduced with dbca3469ebf8.
|
|
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
This fix was already pushed in backbranches earlier.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
|
|
This reverts commit 27162a64b386; this branch is in code freeze due to a
nearing release. We can commit again after the release is out.
Discussion: https://postgr.es/m/1158256.1719239648@sss.pgh.pa.us
|
|
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
Backpatch to 14, where DETACH CONCURRENTLY appeared.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
|
|
Previously, GetJsonPathVar() allowed a jsonpath expression to
reference any prefix of a PASSING variable's name. For example, the
following query would incorrectly work:
SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz);
The fix ensures that the length of the variable name mentioned in a
jsonpath expression matches exactly with the name of the PASSING
variable before comparing the strings using strncmp().
Reported-by: Alvaro Herrera (off-list)
Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com
|
|
If the CALL is within an atomic context (e.g. there's an outer
transaction block), _SPI_execute_plan should acquire a fresh snapshot
to execute any such functions with. We failed to do that and instead
passed them the Portal snapshot, which had been acquired at the start
of the current SQL command. This'd lead to seeing stale values of
rows modified since the start of the command.
This is arguably a bug in 84f5c2908: I failed to see that "are we in
non-atomic mode" needs to be defined the same way as it is further
down in _SPI_execute_plan, i.e. check !_SPI_current->atomic not just
options->allow_nonatomic. Alternatively the blame could be laid on
plpgsql, which is unconditionally passing allow_nonatomic = true
for CALL/DO even when it knows it's in an atomic context. However,
fixing it in spi.c seems like a better idea since that will also fix
the problem for any extensions that may have copied plpgsql's coding
pattern.
While here, update an obsolete comment about _SPI_execute_plan's
snapshot management.
Per report from Victor Yegorov. Back-patch to all supported versions.
Discussion: https://postgr.es/m/CAGnEboiRe+fG2QxuBO2390F7P8e2MQ6UyBjZSL_w1Cej+E4=Vw@mail.gmail.com
|
|
This feature set did not handle empty ranges correctly, and it's now
too late for PostgreSQL 17 to fix it.
The following commits are reverted:
6db4598fcb8 Add stratnum GiST support function
46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints
86232a49a43 Fix comment on gist_stratnum_btree
030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod
a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange.
5577a71fb0c Use half-open interval notation in without_overlaps tests
34768ee3616 Add temporal FOREIGN KEY contraints
482e108cd38 Add test for REPLICA IDENTITY with a temporal key
c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
|
|
A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a
GiST index, not a B-Tree, but it will still have indisunique set. The
code for ON CONFLICT fails if it sees a non-btree index that has
indisunique. This commit fixes that and adds some tests. But now
that we can't just test indisunique, we also need some extra checks to
prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint
(because the conflict could happen against more than one row, and we'd
only update one).
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/1426589a-83cb-4a89-bf40-713970c07e63@illuminatedcomputing.com
|
|
Author: Alexander Lakhin
Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
|
|
As an optimization, we store "name" columns as cstrings in btree
indexes.
Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.
Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
|
|
JsonExprState.input_finfo is only assigned to, never read, and
it's really fairly useless since the value can be gotten out of
the adjacent input_fcinfo field. Let's remove it before someone
starts to depend on it.
While here, also remove TidScanState.tss_htup and AggState.combinedproj,
which are referenced nowhere. Those should have been removed by the
commits that caused them to become disused, but were not.
I don't think a catversion bump is necessary here, since plan trees
are never stored on disk.
Matthias van de Meent
Discussion: https://postgr.es/m/CAEze2WjsY4d0TBymLNGK4zpttUcg_YZaTjyWz2VfDUV6YH8wXQ@mail.gmail.com
|
|
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
|
|
This improves some error messages emitted by SQL/JSON query functions
by mentioning column name when available, such as when they are
invoked as part of evaluating JSON_TABLE() columns. To do so, a new
field column_name is added to both JsonFuncExpr and JsonExpr that is
only populated when creating those nodes for transformed JSON_TABLE()
columns.
While at it, relevant error messages are reworded for clarity.
Reported-by: Jian He <jian.universality@gmail.com>
Suggested-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
|
|
This commit reverts c35a3fb5e0 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This commit reverts b1484a3f19 per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.
Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
|
|
BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.
This only affects prefetching - if the recheck flag changes, we may
prefetch blocks unecessarily and not prefetch blocks that will be
needed. But we don't need to rely on that assumption - we know the
recheck flag for the block we're considering prefetching, so we can
use that.
The impact is very limited in practice - the opclass would need to
assign different recheck flags to different blocks, but none of the
built-in opclasses seems to do that.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
|
|
Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.
With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.
This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.
Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of
in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the
skip fetch optimization into heap AM code.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Change the order so that the table scan is initialized only after
initializing the index scan and building the bitmap.
This is mostly a cosmetic change for now, but later commits will need
to pass parameters to table_beginscan_bm() that are unavailable in
ExecInitBitmapHeapScan().
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively. This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).
Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient. All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.
The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value. Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan. Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.
Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.
Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute. These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).
Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes. They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem). They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).
There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions. Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
|
|
JSON_TABLE() allows JSON data to be converted into a relational view
and thus used, for example, in a FROM clause, like other tabular
data. Data to show in the view is selected from a source JSON object
using a JSON path expression to get a sequence of JSON objects that's
called a "row pattern", which becomes the source to compute the
SQL/JSON values that populate the view's output columns. Column
values themselves are computed using JSON path expressions applied to
each of the JSON objects comprising the "row pattern", for which the
SQL/JSON query functions added in 6185c9737cf4 are used.
To implement JSON_TABLE() as a table function, this augments the
TableFunc and TableFuncScanState nodes that are currently used to
support XMLTABLE() with some JSON_TABLE()-specific fields.
Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN
clauses, which are required to provide more flexibility to extract
data out of nested JSON objects, but they are not implemented here
to keep this commit of manageable size.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
|