Age | Commit message (Collapse) | Author |
|
Use the parser's standard type coercion machinery to convert the
output column(s) of a SQL function's final SELECT or RETURNING
to the type(s) they should have according to the function's declared
result type. We'll allow any case where an assignment-level
coercion is available. Previously, we failed unless the required
coercion was a binary-compatible one (and the documentation ignored
this, falsely claiming that the types must match exactly).
Notably, the coercion now accounts for typmods, so that cases where
a SQL function is declared to return a composite type whose columns
are typmod-constrained now behave as one would expect. Arguably
this aspect is a bug fix, but the overall behavioral change here
seems too large to consider back-patching.
A nice side-effect is that functions can now be inlined in a
few cases where we previously failed to do so because of type
mismatches.
Discussion: https://postgr.es/m/18929.1574895430@sss.pgh.pa.us
|
|
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.
Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
|
|
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c. Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.
This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.
Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
|
|
Remove a duplicated word. Add "of" or "# of" in a couple places
for clarity and consistency. Start comments with a lower case
letter as we do elsewhere in this file.
Rafia Sabih
|
|
Similar to commits 14aec03502, 7e735035f2 and dddf4cdc33, this commit
makes the order of header file inclusion consistent in more places.
Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
|
|
Avoid creating transiently-inconsistent slot states where possible,
by not setting TTS_FLAG_SHOULDFREE until after the slot actually has
a free'able tuple pointer, and by making sure that we reset tts_nvalid
and related derived state before we replace the tuple contents. This
would only matter if something were to examine the slot after we'd
suffered some kind of error (e.g. out of memory) while manipulating
the slot. We typically don't do that, so these changes might just be
cosmetic --- but even if so, it seems like good future-proofing.
Also remove some redundant Asserts, and add a couple for consistency.
Back-patch to v12 where all this code was rewritten.
Discussion: https://postgr.es/m/16095-c3ff2e5283b8dba5@postgresql.org
|
|
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ
chain the former needs to run the result tuple through the junkfilter
again, and update the slot containing the new version of the tuple to
contain that new version. The input tuple may already be in the
junkfilter's output slot, which used to be OK - we don't need the
previous version anymore. Unfortunately ff11e7f4b9ae started to use
ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support
copying a slot into itself, leading to a slot in a corrupt
state, which then can cause crashes or other symptoms.
Fix this by skipping the ExecCopySlot() when copying into itself.
While we could have easily made ExecCopySlot() handle that case, it
seems better to add an assert forbidding doing so instead. As the goal
of copying might be to make the contents of one slot independent from
another, it seems failure prone to handle doing so silently.
A follow-up commit will add tests for the obviously under-covered
combination of EPQ and triggers. Done as a separate commit as it might
make sense to backpatch them further than this bug.
Also remove confusion with confusing variable names for slots in
ExecBRDeleteTriggers() and ExecBRUpdateTriggers().
Bug: #16036
Reported-By: Антон Власов
Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 12-, where ff11e7f4b9ae was merged
|
|
In ad0bda5d24ea I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams. But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.
As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.
As a third issue, the test coverage for EPQ was clearly insufficient.
Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.
To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.
To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).
As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.
Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.
Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
|
|
Most of the fmgr.h includes were obsoleted by 352a24a1f9d6f7d4abb1. A
few others can be obsoleted using the underlying struct type in an
implementation detail.
Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
|
|
These aren't needed after 352a24a1f9d6. The remaining prototypes are
not defined on the SQL level.
Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
|
|
Now that list_nth is O(1), there's no good reason to maintain a
separate array of RTE pointers rather than indexing into
estate->es_range_table. Deleting the array doesn't save all that
much either; but just on cleanliness grounds, it's better not to
have duplicate representations of the identical information.
Discussion: https://postgr.es/m/14960.1565384592@sss.pgh.pa.us
|
|
This addresses more issues with code comments, variable names and
unreferenced variables.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
|
|
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
|
|
Now that commit fec0778c8 drew a clear line between public and private
fields in SPITupleTable, it seems pretty silly that the count of valid
tuples isn't on the public side of that line. The reason why not was
that there wasn't such a count. For reasons lost in the mists of time,
spi.c preferred to keep a count of remaining free entries in the array.
But that seems pretty pointless: it's unlike the way we handle similar
code everywhere else, and it involves extra subtractions that surely
outweigh having to do a comparison rather than test-for-zero to check
for array-full.
Hence, rearrange so that this code does the expansible array logic
the same as everywhere else, with a count of valid entries alongside
the allocated array length. And document the count as public.
I looked for core-code callers where it would make sense to start
relying on tuptable->numvals rather than the separate SPI_processed
variable. Right now there don't seem to be places where it'd be
a win to do so without more code restructuring than I care to
undertake today. In principle, though, having SPITupleTables be
fully self-contained should be helpful down the line.
Discussion: https://postgr.es/m/16852.1563395722@sss.pgh.pa.us
|
|
The fields that we consider public are "tupdesc" and "vals", which
historically are in the middle of the struct. Move them to the front
(this should be perfectly safe to do in HEAD) and add comments to make
it quite clear which fields are public or not.
Also adjust spi.sgml's documentation of the struct to match.
That doc had bit-rotted somewhat, as it was missing some fields.
(Arguably we should just remove all the private fields from the docs,
but for now I refrained.)
Daniel Gustafsson, reviewed by Fabien Coelho
Discussion: https://postgr.es/m/0D19F836-B743-4340-B6A2-F148CA3DD1F0@yesql.se
|
|
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
|
|
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
|
|
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
|
|
Discussion: https://postgr.es/m/20190612184527.GA24266@alvherre.pgsql
Reviewed-by: Michaël Paquier
|
|
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations. Fix them.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
|
|
Author: Andrea Gelmini
Reviewed-by: Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/20190528181718.GA39034@glet
|
|
Switch to 2.1 version of pg_bsd_indent. This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.
Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
|
|
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
|
|
Previously, gen_partprune_steps() always built executor pruning steps
using all suitable clauses, including those containing PARAM_EXEC
Params. This meant that the pruning steps were only completely safe
for executor run-time (scan start) pruning. To prune at executor
startup, we had to ignore the steps involving exec Params. But this
doesn't really work in general, since there may be logic changes
needed as well --- for example, pruning according to the last operator's
btree strategy is the wrong thing if we're not applying that operator.
The rules embodied in gen_partprune_steps() and its minions are
sufficiently complicated that tracking their incremental effects in
other logic seems quite impractical.
Short of a complete redesign, the only safe fix seems to be to run
gen_partprune_steps() twice, once to create executor startup pruning
steps and then again for run-time pruning steps. We can save a few
cycles however by noting during the first scan whether we rejected
any clauses because they involved exec Params --- if not, we don't
need to do the second scan.
In support of this, refactor the internal APIs in partprune.c to make
more use of passing information in the GeneratePruningStepsContext
struct, rather than as separate arguments.
This is, I hope, the last piece of our response to a bug report from
Alan Jackson. Back-patch to v11 where this code came in.
Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
|
|
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true. Fix that by freeing the tuple in that case.
Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.
The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
|
|
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.
This implements one kind of generated column: stored (computed on
write). Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
|
|
Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which
start new transactions with the same transaction characteristics as the
just finished one, per SQL standard.
Support for transaction chaining in PL/pgSQL is also added. This
functionality is especially useful when running COMMIT in a loop in
PL/pgSQL.
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
|
|
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.
Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one. It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.
The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.
As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.
The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).
Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.
Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.
Author: Andres Freund and Haribabu Kommi
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
|
|
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
|
|
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.
To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc. If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued. To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.
Second, successive calls to RelationGetPartitionDesc might not return
the same answer. The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning. Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times. Invent a new
PartitionDirectory concept and use it to ensure consistency. This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view. Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.
For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point. Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started. We might want to tighten that up, but it's material for a
separate patch. There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.
Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution. So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly. There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.
This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc. That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends. Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends. There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.
Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.
Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
|
|
For the upcoming pluggable table access methods it's quite
inconvenient to store tuples as HeapTuples, as that'd require
converting tuples from a their native format into HeapTuples. Instead
use slots to manage epq tuples.
To fit into that scheme, change the foreign data wrapper callback
RefetchForeignRow, to store the tuple in a slot. Insist on using the
caller provided slot, so it conveniently can be stored in the
corresponding EPQ slot. As there is no in core user of
RefetchForeignRow, that change was done blindly, but we plan to test
that soon.
To avoid duplicating that work for row locks, move row locks to just
directly use the EPQ slots - it previously temporarily stored tuples
in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given
we'd possibly end up with a significant number of additional slots.
The behaviour of es_epqTupleSet[rti -1] is now checked by
es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a
slot containing an empty tuple.
Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
|
|
In preparation for abstracting table storage, convert trigger.c to
track tuples in slots. Which also happens to make code calling
triggers simpler.
As the calling interface for triggers themselves is not changed in
this patch, HeapTuples still are extracted from the slot at that
time. But that's handled solely inside trigger.c, not visible to
callers. It's quite likely that we'll want to revise the external
trigger interface, but that's a separate large project.
As part of this work the slots used for old/new/return tuples are
moved from EState into ResultRelInfo, as different updated tables
might need different slots. The slots are now also now created
on-demand, which is good both from an efficiency POV, but also makes
the modifying code simpler.
Author: Andres Freund, Amit Khandekar and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
|
|
After the introduction of tuple table slots all table AMs need to
support returning the table oid of the tuple stored in a slot created
by said AM. It does not make sense to re-implement that in every AM,
therefore move handling of table OIDs into the TupleTableSlot
structure itself. It's possible that we, at a later date, might want
to get rid of HeapTupleData.t_tableOid entirely, but doing so before
the abstractions for table AMs are integrated turns out to be too
hard, so delay that for now.
Similarly, every AM needs to support the concept of a tuple
identifier (tid / item pointer) for its tuples. It's quite possible
that we'll generalize the exact form of a tid at a future point (to
allow for things like index organized tables), but for now many parts
of the code know about tids, so there's not much point in abstracting
tids away. Therefore also move into slot (rather than providing API to
set/get the tid associated with the tuple in a slot).
Once table AM includes insert/updating/deleting tuples, the
responsibility to set the correct tid after such an action will move
into that. After that change, code doing such modifications, should
not have to deal with HeapTuples directly anymore.
Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
|
|
That avoids having to care about the lifetime of the
HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That
doesn't make a huge difference for a plain HeapTupleTableSlot, but for
BufferHeapTupleTableSlot it can be a significant advantage, avoiding
the need to materialize slots where it's inconvenient to provide a
HeapTupleData with appropriate lifetime to point to the on-disk tuple.
It's quite possible that we'll want to add support functions for
constructing HeapTuples using that embedded HeapTupleData, but for now
callers do so themselves.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
|
|
This allows to avoid an unnecessary pin/unpin cycle when storing a
tuple in an already pinned buffer into a slot, when the pin isn't
further needed at the call site.
Only a single caller for now (to ensure coverage), but upcoming
patches will increase use of the new function.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
|
|
This has the advantage that the comparator expression, the table's
slot, etc do not have to be rebuilt. Additionally the simplehash.h
hashtable within the tuple hashtable now keeps its previous size and
doesn't need to be reallocated. That both reduces allocator overhead,
and improves performance in cases where the input estimation was off
by a significant factor.
To avoid an API/ABI break, the new parameter is exposed via the new
BuildTupleHashTableExt(), and BuildTupleHashTable() now is a wrapper
around the former, that continues to allocate the table itself in the
tablecxt.
Using this fixes performance issues discovered in the two bugs
referenced. This commit however has not converted the callers, that's
done in a separate commit.
Bug: #15592 #15486
Reported-By: Jakub Janeček, Dmitry Marakasov
Author: Andres Freund
Discussion:
https://postgr.es/m/15486-05850f065da42931@postgresql.org
https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de
Backpatch: 11, this is a prerequisite for other fixes
|
|
Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to
introduce a more generic subscription mechanism, which allows
subscripting not only arrays but also other object types such as JSONB.
That functionality is introduced in a largish invasive patch, out of
which this internal renaming patch was extracted.
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov
Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
|
|
The old name of this file was never a very good indication of what it
was for. Now that there's also access/relation.h, we have a potential
confusion hazard as well, so let's rename it to something more apropos.
Per discussion, "pathnodes.h" is reasonable, since a good fraction of
the file is Path node definitions.
While at it, tweak a couple of other headers that were gratuitously
importing relation.h into modules that don't need it.
Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
|
|
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays. For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.
Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.
Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win. It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.
Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.
Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.
Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.
This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
|
|
This is the genam.h equivalent of 4c850ecec649c (which removed
heapam.h from a lot of other headers). There's still a few header
includes of genam.h, but not from central headers anymore.
As a few headers are not indirectly included anymore, execnodes.h and
relscan.h need a few additional includes. Some of the depended on
types were replacable by using the underlying structs, but e.g. for
Snapshot in execnodes.h that'd have gotten more invasive than
reasonable in this commit.
Like the aforementioned commit 4c850ecec649c, this requires adding new
genam.h includes to a number of backend files, which likely is also
required in a few external projects.
Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
|
|
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.
heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.
Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.
As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
|
|
Backpatch-through: certain files through 9.4
|
|
These mainly help understanding the function signatures better.
|
|
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
|
|
This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types. As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.
To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time. That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc. See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.
I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.
There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.
There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.
Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
|
|
This yields a minor speedup, which roughly balances the loss from the
upcoming introduction of callbacks to do some operations on slots.
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
|
|
This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well
as the future MERGE) on partitioned tables.
This changes the setup for tuple routing so that it does far less work
during the initial setup and pushes more work out to when partitions
receive tuples. PartitionDispatchData structs for sub-partitioned
tables are only created when a tuple gets routed through it. The
possibly large arrays in the PartitionTupleRouting struct have largely
been removed. The partitions[] array remains but now never contains any
NULL gaps. Previously the NULLs had to be skipped during
ExecCleanupTupleRouting(), which could add a large overhead to the
cleanup when the number of partitions was large. The partitions[] array
is allocated small to start with and only enlarged when we route tuples
to enough partitions that it runs out of space. This allows us to keep
simple single-row partition INSERTs running quickly. Redesign
The arrays in PartitionTupleRouting which stored the tuple translation maps
have now been removed. These have been moved out into a
PartitionRoutingInfo struct which is an additional field in ResultRelInfo.
The find_all_inheritors() call still remains by far the slowest part of
ExecSetupPartitionTupleRouting(). This commit just removes the other slow
parts.
In passing also rename the tuple translation maps from being ParentToChild
and ChildToParent to being RootToPartition and PartitionToRoot. The old
names mislead you into thinking that a partition of some sub-partitioned
table would translate to the rowtype of the sub-partitioned table rather
than the root partitioned table.
Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera
Testing help from Jesper Pedersen and Kato Sho.
Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
|
|
Per MSVC complaint on buildfarm member dory.
|
|
Previously this information was computed when JIT compiling an
expression. But the information is useful for assertions in the
non-JIT case too (for assertions), therefore it makes sense to move
it.
This will, in a followup commit, allow to treat different slot types
differently. E.g. for virtual slots there's no need to generate a JIT
function to deform the slot.
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
|