summaryrefslogtreecommitdiff
path: root/src/include/executor
AgeCommit message (Collapse)Author
2019-11-06Minor code review for tuple slot rewrite.Tom Lane
Avoid creating transiently-inconsistent slot states where possible, by not setting TTS_FLAG_SHOULDFREE until after the slot actually has a free'able tuple pointer, and by making sure that we reset tts_nvalid and related derived state before we replace the tuple contents. This would only matter if something were to examine the slot after we'd suffered some kind of error (e.g. out of memory) while manipulating the slot. We typically don't do that, so these changes might just be cosmetic --- but even if so, it seems like good future-proofing. Also remove some redundant Asserts, and add a couple for consistency. Back-patch to v12 where all this code was rewritten. Discussion: https://postgr.es/m/16095-c3ff2e5283b8dba5@postgresql.org
2019-10-04Fix crash caused by EPQ happening with a before update trigger present.Andres Freund
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ chain the former needs to run the result tuple through the junkfilter again, and update the slot containing the new version of the tuple to contain that new version. The input tuple may already be in the junkfilter's output slot, which used to be OK - we don't need the previous version anymore. Unfortunately ff11e7f4b9ae started to use ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support copying a slot into itself, leading to a slot in a corrupt state, which then can cause crashes or other symptoms. Fix this by skipping the ExecCopySlot() when copying into itself. While we could have easily made ExecCopySlot() handle that case, it seems better to add an assert forbidding doing so instead. As the goal of copying might be to make the contents of one slot independent from another, it seems failure prone to handle doing so silently. A follow-up commit will add tests for the obviously under-covered combination of EPQ and triggers. Done as a separate commit as it might make sense to backpatch them further than this bug. Also remove confusion with confusing variable names for slots in ExecBRDeleteTriggers() and ExecBRUpdateTriggers(). Bug: #16036 Reported-By: Антон Власов Author: Andres Freund Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org Backpatch: 12-, where ff11e7f4b9ae was merged
2019-09-09Reorder EPQ work, to fix rowmark related bugs and improve efficiency.Andres Freund
In ad0bda5d24ea I changed the EvalPlanQual machinery to store substitution tuples in slot, instead of using plain HeapTuples. The main motivation for that was that using HeapTuples will be inefficient for future tableams. But it turns out that that conversion was buggy for non-locking rowmarks - the wrong tuple descriptor was used to create the slot. As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ earlier, to allow to fetch the locked rows directly into the EPQ slots, instead of having to copy tuples around. Unfortunately, as Tom complained, that forces some expensive initialization to happen earlier. As a third issue, the test coverage for EPQ was clearly insufficient. Fixing the first issue is unfortunately not trivial: Non-locked row marks were fetched at the start of EPQ, and we don't have the type information for the rowmarks available at that point. While we could change that, it's not easy. It might be worthwhile to change that at some point, but to fix this bug, it seems better to delay fetching non-locking rowmarks when they're actually needed, rather than eagerly. They're referenced at most once, and in cases where EPQ fails, might never be referenced. Fetching them when needed also increases locality a bit. To be able to fetch rowmarks during execution, rather than initialization, we need to be able to access the active EPQState, as that contains necessary data. To do so move EPQ related data from EState to EPQState, and, only for EStates creates as part of EPQ, reference the associated EPQState from EState. To fix the second issue, change EPQ initialization to allow use of EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but obviously still requiring EvalPlanQualInit() to have been done). As these changes made struct EState harder to understand, e.g. by adding multiple EStates, significantly reorder the members, and add a lot more comments. Also add a few more EPQ tests, including one that fails for the first issue above. More is needed. Reported-By: yi huang Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com https://postgr.es/m/24530.1562686693@sss.pgh.pa.us Backpatch: 12-, where the EPQ changes were introduced
2019-07-01Fix many typos and inconsistenciesMichael Paquier
Author: Alexander Lakhin Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-06-17Fix more typos and inconsistencies in the treeMichael Paquier
Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-13Fix double-word typosAlvaro Herrera
Discussion: https://postgr.es/m/20190612184527.GA24266@alvherre.pgsql Reviewed-by: Michaël Paquier
2019-06-08Fix assorted inconsistencies.Amit Kapila
There were a number of issues in the recent commits which include typos, code and comments mismatch, leftover function declarations. Fix them. Reported-by: Alexander Lakhin Author: Alexander Lakhin, Amit Kapila and Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-03Fix typos in various placesMichael Paquier
Author: Andrea Gelmini Reviewed-by: Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/20190528181718.GA39034@glet
2019-05-22Phase 2 pgindent run for v12.Tom Lane
Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22Initial pgindent run for v12.Tom Lane
This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-17Restructure creation of run-time pruning steps.Tom Lane
Previously, gen_partprune_steps() always built executor pruning steps using all suitable clauses, including those containing PARAM_EXEC Params. This meant that the pruning steps were only completely safe for executor run-time (scan start) pruning. To prune at executor startup, we had to ignore the steps involving exec Params. But this doesn't really work in general, since there may be logic changes needed as well --- for example, pruning according to the last operator's btree strategy is the wrong thing if we're not applying that operator. The rules embodied in gen_partprune_steps() and its minions are sufficiently complicated that tracking their incremental effects in other logic seems quite impractical. Short of a complete redesign, the only safe fix seems to be to run gen_partprune_steps() twice, once to create executor startup pruning steps and then again for run-time pruning steps. We can save a few cycles however by noting during the first scan whether we rejected any clauses because they involved exec Params --- if not, we don't need to do the second scan. In support of this, refactor the internal APIs in partprune.c to make more use of passing information in the GeneratePruningStepsContext struct, rather than as separate arguments. This is, I hope, the last piece of our response to a bug report from Alan Jackson. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-04-19Fix two memory leaks around force-storing tuples in slots.Andres Freund
As reported by Tom, when ExecStoreMinimalTuple() had to perform a conversion to store the minimal tuple in the slot, it forgot to respect the shouldFree flag, and leaked the tuple into the current memory context if true. Fix that by freeing the tuple in that case. Looking at the relevant code made me (Andres) realize that not having the shouldFree parameter to ExecForceStoreHeapTuple() was a bad idea. Some callers had to locally implement the necessary logic, and in one case it was missing, creating a potential per-group leak in non-hashed aggregation. The choice to not free the tuple in ExecComputeStoredGenerated() is not pretty, but not introduced by this commit - I'll start a separate discussion about it. Reported-By: Tom Lane Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-03-30Generated columnsPeter Eisentraut
This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-24Transaction chainingPeter Eisentraut
Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which start new transactions with the same transaction characteristics as the just finished one, per SQL standard. Support for transaction chaining in PL/pgSQL is also added. This functionality is especially useful when running COMMIT in a loop in PL/pgSQL. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr> Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
2019-03-23tableam: Add tuple_{insert, delete, update, lock} and use.Andres Freund
This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-22Collations with nondeterministic comparisonPeter Eisentraut
This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-07Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas
We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-01Store tuples for EvalPlanQual in slots, rather than as HeapTuples.Andres Freund
For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Use slots in trigger infrastructure, except for the actual invocation.Andres Freund
In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Store table oid and tuple's tid in tuple slots directly.Andres Freund
After the introduction of tuple table slots all table AMs need to support returning the table oid of the tuple stored in a slot created by said AM. It does not make sense to re-implement that in every AM, therefore move handling of table OIDs into the TupleTableSlot structure itself. It's possible that we, at a later date, might want to get rid of HeapTupleData.t_tableOid entirely, but doing so before the abstractions for table AMs are integrated turns out to be too hard, so delay that for now. Similarly, every AM needs to support the concept of a tuple identifier (tid / item pointer) for its tuples. It's quite possible that we'll generalize the exact form of a tid at a future point (to allow for things like index organized tables), but for now many parts of the code know about tids, so there's not much point in abstracting tids away. Therefore also move into slot (rather than providing API to set/get the tid associated with the tuple in a slot). Once table AM includes insert/updating/deleting tuples, the responsibility to set the correct tid after such an action will move into that. After that change, code doing such modifications, should not have to deal with HeapTuples directly anymore. Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Allow to use HeapTupleData embedded in [Buffer]HeapTupleTableSlot.Andres Freund
That avoids having to care about the lifetime of the HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That doesn't make a huge difference for a plain HeapTupleTableSlot, but for BufferHeapTupleTableSlot it can be a significant advantage, avoiding the need to materialize slots where it's inconvenient to provide a HeapTupleData with appropriate lifetime to point to the on-disk tuple. It's quite possible that we'll want to add support functions for constructing HeapTuples using that embedded HeapTupleData, but for now callers do so themselves. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Add ExecStorePinnedBufferHeapTuple.Andres Freund
This allows to avoid an unnecessary pin/unpin cycle when storing a tuple in an already pinned buffer into a slot, when the pin isn't further needed at the call site. Only a single caller for now (to ensure coverage), but upcoming patches will increase use of the new function. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-09Allow to reset execGrouping.c style tuple hashtables.Andres Freund
This has the advantage that the comparator expression, the table's slot, etc do not have to be rebuilt. Additionally the simplehash.h hashtable within the tuple hashtable now keeps its previous size and doesn't need to be reallocated. That both reduces allocator overhead, and improves performance in cases where the input estimation was off by a significant factor. To avoid an API/ABI break, the new parameter is exposed via the new BuildTupleHashTableExt(), and BuildTupleHashTable() now is a wrapper around the former, that continues to allocate the table itself in the tablecxt. Using this fixes performance issues discovered in the two bugs referenced. This commit however has not converted the callers, that's done in a separate commit. Bug: #15592 #15486 Reported-By: Jakub Janeček, Dmitry Marakasov Author: Andres Freund Discussion: https://postgr.es/m/15486-05850f065da42931@postgresql.org https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de Backpatch: 11, this is a prerequisite for other fixes
2019-02-01Renaming for new subscripting mechanismAlvaro Herrera
Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to introduce a more generic subscription mechanism, which allows subscripting not only arrays but also other object types such as JSONB. That functionality is introduced in a largish invasive patch, out of which this internal renaming patch was extracted. Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
2019-01-29Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane
The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
2019-01-26Change function call information to be variable length.Andres Freund
Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-14Don't include genam.h from execnodes.h and relscan.h anymore.Andres Freund
This is the genam.h equivalent of 4c850ecec649c (which removed heapam.h from a lot of other headers). There's still a few header includes of genam.h, but not from central headers anymore. As a few headers are not indirectly included anymore, execnodes.h and relscan.h need a few additional includes. Some of the depended on types were replacable by using the underlying structs, but e.g. for Snapshot in execnodes.h that'd have gotten more invasive than reasonable in this commit. Like the aforementioned commit 4c850ecec649c, this requires adding new genam.h includes to a number of backend files, which likely is also required in a few external projects. Author: Andres Freund Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14Don't include heapam.h from others headers.Andres Freund
heapam.h previously was included in a number of widely used headers (e.g. execnodes.h, indirectly in executor.h, ...). That's problematic on its own, as heapam.h contains a lot of low-level details that don't need to be exposed that widely, but becomes more problematic with the upcoming introduction of pluggable table storage - it seems inappropriate for heapam.h to be included that widely afterwards. heapam.h was largely only included in other headers to get the HeapScanDesc typedef (which was defined in heapam.h, even though HeapScanDescData is defined in relscan.h). The better solution here seems to be to just use the underlying struct (forward declared where necessary). Similar for BulkInsertState. Another problem was that LockTupleMode was used in executor.h - parts of the file tried to cope without heapam.h, but due to the fact that it indirectly included it, several subsequent violations of that goal were not not noticed. We could just reuse the approach of declaring parameters as int, but it seems nicer to move LockTupleMode to lockoptions.h - that's not a perfect location, but also doesn't seem bad. As a number of files relied on implicitly included heapam.h, a significant number of files grew an explicit include. It's quite probably that a few external projects will need to do the same. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-12-22Add some const decorationsPeter Eisentraut
These mainly help understanding the function signatures better.
2018-11-20Remove WITH OIDS support, change oid catalog column visibility.Andres Freund
Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-16Make TupleTableSlots extensible, finish split of existing slot type.Andres Freund
This commit completes the work prepared in 1a0586de36, splitting the old TupleTableSlot implementation (which could store buffer, heap, minimal and virtual slots) into four different slot types. As described in the aforementioned commit, this is done with the goal of making tuple table slots extensible, to allow for pluggable table access methods. To achieve runtime extensibility for TupleTableSlots, operations on slots that can differ between types of slots are performed using the TupleTableSlotOps struct provided at slot creation time. That includes information from the size of TupleTableSlot struct to be allocated, initialization, deforming etc. See the struct's definition for more detailed information about callbacks TupleTableSlotOps. I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more consistent with other naming introduced in recent patches. There's plenty optimization potential in the slot implementation, but according to benchmarking the state after this commit has similar performance characteristics to before this set of changes, which seems sufficient. There's a few changes in execReplication.c that currently need to poke through the slot abstraction, that'll be repaired once the pluggable storage patchset provides the necessary infrastructure. Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16Inline hot path of slot_getsomeattrs().Andres Freund
This yields a minor speedup, which roughly balances the loss from the upcoming introduction of callbacks to do some operations on slots. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16Redesign initialization of partition routing structuresAlvaro Herrera
This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
2018-11-15Add dummy field to currently empty struct TupleTableSlotOps.Andres Freund
Per MSVC complaint on buildfarm member dory.
2018-11-15Compute information about EEOP_*_FETCHSOME at expression init time.Andres Freund
Previously this information was computed when JIT compiling an expression. But the information is useful for assertions in the non-JIT case too (for assertions), therefore it makes sense to move it. This will, in a followup commit, allow to treat different slot types differently. E.g. for virtual slots there's no need to generate a JIT function to deform the slot. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15Introduce notion of different types of slots (without implementing them).Andres Freund
Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15Rejigger materializing and fetching a HeapTuple from a slot.Andres Freund
Previously materializing a slot always returned a HeapTuple. As current work aims to reduce the reliance on HeapTuples (so other storage systems can work efficiently), that needs to change. Thus split the tasks of materializing a slot (i.e. making it independent from the underlying storage / other memory contexts) from fetching a HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from a slot and materializing the slot at the same time, controlled by a parameter. For now some callers of ExecFetchSlotHeapTuple, with materialize = true, expect that changes to the heap tuple will be reflected in the underlying slot. Those places will be adapted in due course, so while not pretty, that's OK for now. Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely that future storage methods will need similar methods. There already is ExecFetchSlotMinimalTuple, so the new names make the naming scheme more coherent. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-09Don't require return slots for nodes without projection.Andres Freund
In a lot of nodes the return slot is not required. That can either be because the node doesn't do any projection (say an Append node), or because the node does perform projections but the projection is optimized away because the projection would yield an identical row. Slots aren't that small, especially for wide rows, so it's worthwhile to avoid creating them. It's not possible to just skip creating the slot - it's currently used to determine the tuple descriptor returned by ExecGetResultType(). So separate the determination of the result type from the slot creation. The work previously done internally ExecInitResultTupleSlotTL() can now also be done separately with ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that aren't guaranteed to need a result slot, can use ExecInitResultTypeTL() to determine the result type of the node, and ExecAssignScanProjectionInfo() (via ExecConditionalAssignProjectionInfo()) determines that a result slot is needed, it is created with ExecInitResultSlot(). Besides the advantage of avoiding to create slots that then are unused, this is necessary preparation for later patches around tuple table slot abstraction. In particular separating the return descriptor and slot is a prerequisite to allow JITing of tuple deforming with knowledge of the underlying tuple format, and to avoid unnecessarily creating JITed tuple deforming for virtual slots. This commit removes a redundant argument from ExecInitResultTupleSlotTL(). While this commit touches a lot of the relevant lines anyway, it'd normally still not worthwhile to cause breakage, except that aforementioned later commits will touch *all* ExecInitResultTupleSlotTL() callers anyway (but fits worse thematically). Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-07Move EEOP_*_SYSVAR evaluation out of line.Andres Freund
This mainly de-duplicates code. As evaluating a system variable isn't the hottest path and the current inline implementation ends up calling out to an external function anyway, this is OK from a performance POV. The main motivation for de-duplicating is the upcoming slot abstraction work, after which there's not guaranteed to be a HeapTuple backing the slot. Author: Andres Freund, Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-10-15Move TupleTableSlots boolean member into one flag variable.Andres Freund
There's several reasons for this change: 1) It reduces the total size of TupleTableSlot / reduces alignment padding, making the commonly accessed members fit into a single cacheline (but we currently do not force proper alignment, so that's not yet guaranteed to be helpful) 2) Combining the booleans into a flag allows to combine read/writes from memory. 3) With the upcoming slot abstraction changes, it allows to have core and extended flags, in a memory efficient way. Author: Ashutosh Bapat and Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-10-15Move generic slot support functions from heaptuple.c into execTuples.c.Andres Freund
heaptuple.c was never a particular good fit for slot_getattr(), slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming changes slots will be made more abstract (allowing slots that contain different types of tuples), making it clearly the wrong place. Note that slot_deform_tuple() remains in it's current place, as it clearly deals with a HeapTuple. getmissingattrs() also remains, but it's less clear that that's correct - but execTuples.c wouldn't be the right place. Author: Ashutosh Bapat. Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-10-08Advance transaction timestamp for intra-procedure transactions.Tom Lane
Per discussion, this behavior seems less astonishing than not doing so. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/20180920234040.GC29981@momjian.us
2018-10-06Remove more redundant relation locking during executor startup.Tom Lane
We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04In the executor, use an array of pointers to access the rangetable.Tom Lane
Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04Centralize executor's opening/closing of Relations for rangetable entries.Tom Lane
Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-02Use slots more widely in tuple mapping code and make naming more consistent.Andres Freund
It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -
2018-09-25Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple.Andres Freund
Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-09-25Change TupleTableSlot->tts_nvalid to type AttrNumber.Andres Freund
Previously it was an int / 4 bytes. The maximum number of attributes in a tuple is restricted by the maximum value Var->varattno, which is an AttrNumber/int16. Hence use the same data type for TupleTableSlot->tts_nvalid. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-09-25Collect JIT instrumentation from workers.Andres Freund
Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT compilation timings did not include the overhead from doing so on the workers. Fix that. We do so by simply aggregating the cost of doing JIT compilation on workers and the leader together. Arguably that's not quite accurate, because the total time spend doing so is spent in parallel - but it's hard to do much better. For additional detail, when VERBOSE is specified, the stats for workers are displayed separately. Author: Amit Khandekar and Andres Freund Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com Backpatch: 11-