summaryrefslogtreecommitdiff
path: root/src/include/executor
AgeCommit message (Collapse)Author
2018-02-26Update PartitionTupleRouting struct commentAlvaro Herrera
Small review on edd44738bc88. Discussion: https://postgr.es/m/20180222165315.k27qfn4goskhoswj@alvherre.pgsql Reviewed-by: Robert Haas, Amit Langote
2018-02-22Be lazier about partition tuple routing.Robert Haas
It's not necessary to fully initialize the executor data structures for partitions to which no tuples are ever routed. Consider, for example, an INSERT statement that inserts only one row: it only cares about the partition to which that one row is routed. The new function ExecInitPartitionInfo performs the initialization in question only when a particular partition is about to receive a tuple. This includes creating, validating, and saving a pointer to the ResultRelInfo, setting up for speculative insertions, translating WCOs and initializing the resulting expressions, translating returning lists and building the appropriate projection information, and setting up a tuple conversion map. One thing that's not deferred is locking the child partitions; that seems desirable but would need more thought. Still, testing shows that this makes single-row inserts significantly faster on a table with many partitions without harming the bulk-insert case. Amit Langote, reviewed by Etsuro Fujita, with a few changes by me Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
2018-02-20Use platform independent type for TupleTableSlot->tts_off.Andres Freund
Previously tts_off was, for unknown reasons, of type long. For one that's unnecessary as tuples are restricted in length, for another long would be a bad choice of type even if that weren't the case, as it's not reliably wider than an int. Also HeapTupleHeader->t_len is a uint32. This is split off from a larger patch implementing JITed tuple deforming. Seems like an independent improvement, as tiny as it is. Author: Andres Freund
2018-02-16Allow tupleslots to have a fixed tupledesc, use in executor nodes.Andres Freund
The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-16Do execGrouping.c via expression eval machinery, take two.Andres Freund
This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Large parts of this were previously committed (773aec7aa), but the commit contained an omission around cross-type comparisons and was thus reverted. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-02-15Revert "Do execGrouping.c via expression eval machinery."Andres Freund
This reverts commit 773aec7aa98abd38d6d9435913bb8e14e392c274. There's an unresolved issue in the reverted commit: It only creates one comparator function, but in for the nodeSubplan.c case we need more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in nodeSubplan.c). This isn't too difficult to fix, but it's not entirely trivial either. The fact that the issue only causes breakage on 32bit systems shows that the current test coverage isn't that great. To avoid turning half the buildfarm red till those two issues are addressed, revert.
2018-02-15Do execGrouping.c via expression eval machinery.Andres Freund
This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-01-29Introduce ExecQualAndReset() helper.Andres Freund
It's a common task to evaluate a qual and reset the corresponding expression context. Currently that requires storing the result of the qual eval, resetting the context, and then reacting on the result. As that's awkward several places only reset the context next time through a node. That's not great, so introduce a helper that evaluates and resets. It's a bit ugly that it currently uses MemoryContextReset() instead of ResetExprContext(), but that seems easier than reordering all of executor.h. Author: Andres Freund Discussion: https://postgr.es/m/20180109222544.f7loxrunqh3xjl5f@alap3.anarazel.de
2018-01-24Avoid referencing off the end of subplan_partition_offsets.Robert Haas
Report by buildfarm member skink and Tom Lane. Analysis by me. Patch by Amit Khandekar. Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com
2018-01-22Transaction control in PL proceduresPeter Eisentraut
In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. - SPI Add a new function SPI_connect_ext() that is like SPI_connect() but allows passing option flags. The only option flag right now is SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction control commands, otherwise it's not allowed. This is meant to be passed down from CALL and DO statements which themselves know in which context they are called. A nonatomic SPI connection uses different memory management. A normal SPI connection allocates its memory in TopTransactionContext. For nonatomic connections we use PortalContext instead. As the comment in SPI_connect_ext() (previously SPI_connect()) indicates, one could potentially use PortalContext in all cases, but it seems safest to leave the existing uses alone, because this stuff is complicated enough already. SPI also gets new functions SPI_start_transaction(), SPI_commit(), and SPI_rollback(), which can be used by PLs to implement their transaction control logic. - portalmem.c Some adjustments were made in the code that cleans up portals at transaction abort. The portal code could already handle a command *committing* a transaction and continuing (e.g., VACUUM), but it was not quite prepared for a command *aborting* a transaction and continuing. In AtAbort_Portals(), remove the code that marks an active portal as failed. As the comment there already predicted, this doesn't work if the running command wants to keep running after transaction abort. And it's actually not necessary, because pquery.c is careful to run all portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if there is an exception. So the code in AtAbort_Portals() is never used anyway. In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not to clean up active portals too much. This mirrors similar code in PreCommit_Portals(). - PL/Perl Gets new functions spi_commit() and spi_rollback() - PL/pgSQL Gets new commands COMMIT and ROLLBACK. Update the PL/SQL porting example in the documentation to reflect that transactions are now possible in procedures. - PL/Python Gets new functions plpy.commit and plpy.rollback. - PL/Tcl Gets new commands commit and rollback. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-19Allow UPDATE to move rows between partitions.Robert Haas
When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-09Expression evaluation based aggregate transition invocation.Andres Freund
Previously aggregate transition and combination functions were invoked by special case code in nodeAgg.c, evaluating input and filters separately using the expression evaluation machinery. That turns out to not be great for performance for several reasons: - repeated expression evaluations have some cost - the transition functions invocations are poorly predicted, as commonly there are multiple aggregates in a query, resulting in the same call-stack invoking different functions. - filter and input computation had to be done separately - the special case code made it hard to implement JITing of the whole transition function invocation Address this by building one large expression that computes input, evaluates filters, and invokes transition functions. This leads to moderate speedups in queries bottlenecked by aggregate computations, and enables large speedups for similar cases once JITing is done. There's potential for further improvement: - It'd be nice if we could simplify the somewhat expensive aggstate->all_pergroups lookups. - right now there's still an advance_transition_function invocation in nodeAgg.c, leading to some code duplication. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-01-05Factor error generation out of ExecPartitionCheck.Robert Haas
At present, we always raise an ERROR if the partition constraint is violated, but a pending patch for UPDATE tuple routing will consider instead moving the tuple to the correct partition. Refactor to make that simpler. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me. Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
2018-01-04Simplify and encapsulate tuple routing support code.Robert Haas
Instead of having ExecSetupPartitionTupleRouting return multiple out parameters, have it return a pointer to a structure containing all of those different things. Also, provide and use a cleanup function, ExecCleanupTupleRouting, instead of cleaning up all of the resources allocated by ExecSetupPartitionTupleRouting individually. Amit Khandekar, reviewed by Amit Langote, David Rowley, and me Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2018-01-02Ensure proper alignment of tuples in HashMemoryChunkData buffers.Tom Lane
The previous coding relied (without any documentation) on the data[] member of HashMemoryChunkData being at a MAXALIGN'ed offset. If it was not, the tuples would not be maxaligned either, leading to failures on alignment-picky machines. While there seems to be no live bug on any platform we support, this is clearly pretty fragile: any addition to or rearrangement of the fields in HashMemoryChunkData could break it. Let's remove the hazard by getting rid of the data[] member and instead using pointer arithmetic with an explicitly maxalign'ed offset. Discussion: https://postgr.es/m/14483.1514938129@sss.pgh.pa.us
2018-01-01Fix EXPLAIN ANALYZE output for Parallel Hash.Andres Freund
In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size information. Refactor so that participants report only on batches they worked on rather than trying to report on all of them, and teach explain.c to consider the HashInstrumentation object from all participants instead of picking the first one it can find. This should fix an occasional build farm failure in the "join" regression test. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
2017-12-29Perform slot validity checks in a separate pass over expression.Andres Freund
This reduces code duplication a bit, but the primary benefit that it makes JITing expression evaluation easier. When doing so we can't, as previously done in the interpreted case, really change opcode without recompiling. Nor dow we just carry around unnecessary branches to avoid re-checking over and over. As a minor side-effect this makes ExecEvalStepOp() O(log(N)) rather than O(N). Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-26Add includes to make header files self-containedPeter Eisentraut
2017-12-21Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.Tom Lane
This patch does three interrelated things: * Create a new expression execution step type EEOP_PARAM_CALLBACK and add the infrastructure needed for add-on modules to generate that. As discussed, the best control mechanism for that seems to be to add another hook function to ParamListInfo, which will be called by ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found. For stand-alone expressions, we add a new entry point to allow the ParamListInfo to be specified directly, since it can't be retrieved from the parent plan node's EState. * Redesign the API for the ParamListInfo paramFetch hook so that the ParamExternData array can be entirely virtual. This also lets us get rid of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to decide which param IDs should be accessible or not. plpgsql_param_fetch was already doing the identical masking check, so having callers do it too seemed redundant. While I was at it, I added a "speculative" flag to paramFetch that the planner can specify as TRUE to avoid unwanted failures. This solves an ancient problem for plpgsql that it couldn't provide values of non-DTYPE_VAR variables to the planner for fear of triggering premature "record not assigned yet" or "field not found" errors during planning. * Rework plpgsql to get rid of the need for "unshared" parameter lists, by dint of turning the single ParamListInfo per estate into a nearly read-only data structure that doesn't instantiate any per-variable data. Instead, the paramFetch hook controls access to per-variable data and can make the right decisions on the fly, replacing the cases that we used to need multiple ParamListInfos for. This might perhaps have been a performance loss on its own, but by using a paramCompile hook we can bypass plpgsql_param_fetch entirely during normal query execution. (It's now only called when, eg, we copy the ParamListInfo into a cursor portal. copyParamList() or SerializeParamList() effectively instantiate the virtual parameter array as a simple physical array without a paramFetch hook, which is what we want in those cases.) This allows reverting most of commit 6c82d8d1f, though I kept the cosmetic code-consolidation aspects of that (eg the assign_simple_var function). Performance testing shows this to be at worst a break-even change, and it can provide wins ranging up to 20% in test cases involving accesses to fields of "record" variables. The fact that values of such variables can now be exposed to the planner might produce wins in some situations, too, but I've not pursued that angle. In passing, remove the "parent" pointer from the arguments to ExecInitExprRec and related functions, instead storing that pointer in a transient field in ExprState. The ParamListInfo pointer for a stand-alone expression is handled the same way; we'd otherwise have had to add yet another recursively-passed-down argument in expression compilation. Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21Add parallel-aware hash joins.Andres Freund
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-19Try again to fix accumulation of parallel worker instrumentation.Robert Haas
When a Gather or Gather Merge node is started and stopped multiple times, accumulate instrumentation data only once, at the end, instead of after each execution, to avoid recording inflated totals. Commit 778e78ae9fa51e58f41cbdc72b293291d02d8984, the previous attempt at a fix, instead reset the state after every execution, which worked for the general instrumentation data but had problems for the additional instrumentation specific to Sort and Hash nodes. Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, following a design proposal from Thomas Munro, with a comment tweak by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-13Allow executor nodes to change their ExecProcNode function.Andres Freund
In order for executor nodes to be able to change their ExecProcNode function after ExecInitNode() has finished, provide ExecSetExecProcNode(). This allows any wrappers functions that only execProcnode.c knows about to be reinstalled. The motivation for wanting to change ExecProcNode after ExecInitNode() has finished is that it is not known until later whether parallel query is available, so if a parallel variant is to be installed then ExecInitNode() is too soon to decide. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
2017-12-05Support Parallel Append plan nodes.Robert Haas
When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate.Andres Freund
If a hash join appears in a parallel query, there may be no hash table available for explain.c to inspect even though a hash table may have been built in other processes. This could happen either because parallel_leader_participation was set to off or because the leader happened to hit the end of the outer relation immediately (even though the complete relation is not empty) and decided not to build the hash table. Commit bf11e7ee introduced a way for workers to exchange instrumentation via the DSM segment for Sort nodes even though they are not parallel-aware. This commit does the same for Hash nodes, so that explain.c has a way to find instrumentation data from an arbitrary participant that actually built the hash table. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
2017-12-01Re-allow INSERT .. ON CONFLICT DO NOTHING on partitioned tables.Robert Haas
Commit 8355a011a0124bdf7ccbada206a967d427039553 was reverted in f05230752d53c4aa74cffa9b699983bbb6bcb118, but this attempt is hopefully better-considered: we now pass the correct value to ExecOpenIndices, which should avoid the crash that we hit before. Amit Langote, reviewed by Simon Riggs and by me. Some final editing by me. Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp
2017-11-29Update typedefs.list and re-run pgindentRobert Haas
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-25Avoid projecting tuples unnecessarily in Gather and Gather Merge.Robert Haas
It's most often the case that the target list for the Gather (Merge) node matches the target list supplied by the underlying plan node; when this is so, we can avoid the overhead of projecting. This depends on commit f455e1125e2588d4cd4fc663c6a10da4e003a3b5 for proper functioning. Idea by Andres Freund. Patch by me. Review by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
2017-11-16Provide DSM segment to ExecXXXInitializeWorker functions.Andres Freund
Previously, executor nodes running in parallel worker processes didn't have access to the dsm_segment object used for parallel execution. In order to support resource management based on DSM segment lifetime, they need that. So create a ParallelWorkerContext object to hold it and pass it to all InitializeWorker functions. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-11-16Pass InitPlan values to workers via Gather (Merge).Robert Haas
If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan that computes it is attached to or above the Gather (Merge), force the value to be computed before starting parallelism and pass it down to all workers. This allows us to use parallelism in cases where it previously would have had to be rejected as unsafe. We do - in this case - lose the optimization that the value is only computed if it's actually used. An alternative strategy would be to have the first worker that needs the value compute it, but one downside of that approach is that we'd then need to select a parallel-safe path to compute the parameter value; it couldn't for example contain a Gather (Merge) node. At some point in the future, we might want to consider both approaches. Independent of that consideration, there is a great deal more work that could be done to make more kinds of PARAM_EXEC parameters parallel-safe. This infrastructure could be used to allow a Gather (Merge) on the inner side of a nested loop (although that's not a very appealing plan) and cases where the InitPlan is attached below the Gather (Merge) could be addressed as well using various techniques. But this is a good start. Amit Kapila, reviewed and revised by me. Reviewing and testing from Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja. Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
2017-11-15Centralize executor-related partitioning code.Robert Haas
Some code is moved from partition.c, which has grown very quickly lately; splitting the executor parts out might help to keep it from getting totally out of control. Other code is moved from execMain.c. All is moved to a new file execPartition.c. get_partition_for_tuple now has a new interface that more clearly separates executor concerns from generic concerns. Amit Langote. A slight comment tweak by me. Discussion: http://postgr.es/m/1f0985f8-3b61-8bc4-4350-baa6d804cb6d@lab.ntt.co.jp
2017-11-10Add some const decorations to prototypesPeter Eisentraut
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2017-11-08Change TRUE/FALSE to true/falsePeter Eisentraut
The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-10-12Use ResultRelInfo ** rather than ResultRelInfo * for tuple routing.Robert Haas
The previous convention doesn't lend itself to creating ResultRelInfos lazily, as we already do in ExecGetTriggerResultRel. This patch doesn't make anything lazier than before, but the pending patch for UPDATE tuple routing proposes to do so (and there might be other opportunities as well). Amit Khandekar with some adjustments by me. Discussion: http://postgr.es/m/CA+TgmoYPVP9Lyf6vUFA5DwxS4c--x6LOj2y36BsJaYtp62eXPQ@mail.gmail.com
2017-10-08Reduce memory usage of targetlist SRFs.Andres Freund
Previously nodeProjectSet only released memory once per input tuple, rather than once per returned tuple. If the computation of an individual returned tuple requires a lot of memory, that can lead to problems. Instead change things so that the expression context can be reset once per output tuple, which requires a new memory context to store SRF arguments in. This is a longstanding issue, but was hard to fix before 9.6, due to the way tSRFs where evaluated. But it's fairly easy to fix now. We could backpatch this into 10, but given there've been fewc omplaints that doesn't seem worth the risk so far. Reported-By: Lucas Fairchild Author: Andres Freund, per discussion with Tom Lane Discussion: https://postgr.es/m/4514.1507318623@sss.pgh.pa.us
2017-10-06Fix crash when logical decoding is invoked from a PL function.Tom Lane
The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit 7ec1c5a86. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
2017-09-30Support arrays over domains.Tom Lane
Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
2017-09-14Remove TupleDesc remapping logic from tqueue.c.Andres Freund
With the introduction of a shared memory record typmod registry, it is no longer necessary to remap record typmods when sending tuples between backends so most of tqueue.c can be removed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-07Reduce excessive dereferencing of function pointersPeter Eisentraut
It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
2017-09-07Even if some partitions are foreign, allow tuple routing.Robert Haas
This doesn't allow routing tuple to the foreign partitions themselves, but it permits tuples to be routed to regular partitions despite the presence of foreign partitions in the same inheritance hierarchy. Etsuro Fujita, reviewed by Amit Langote and by me. Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp
2017-09-01Improve division of labor between execParallel.c and nodeGather[Merge].c.Tom Lane
Move the responsibility for creating/destroying TupleQueueReaders into execParallel.c, to avoid duplicative coding in nodeGather.c and nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do shm_mq_detach, do it in the caller (which is now only ExecParallelFinish). This means execParallel.c does both the attaching and detaching of the tuple-queue-reader shm_mqs, which seems less weird than the previous arrangement. These changes also eliminate a vestigial memory leak (of the pei->tqueue array). It's now demonstrable that rescans of Gather or GatherMerge don't leak memory. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
2017-08-30Separate reinitialization of shared parallel-scan state from ExecReScan.Tom Lane
Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-29Propagate sort instrumentation from workers back to leader.Robert Haas
Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com
2017-08-29Push tuple limits through Gather and Gather Merge.Robert Haas
If we only need, say, 10 tuples in total, then we certainly don't need more than 10 tuples from any single process. Pushing down the limit lets workers exit early when possible. For Gather Merge, there is an additional benefit: a Sort immediately below the Gather Merge can be done as a bounded sort if there is an applicable limit. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+TgmoYa3QKKrLj5rX7UvGqhH73G1Li4B-EKxrmASaca2tFu9Q@mail.gmail.com
2017-08-18Fix interaction of triggers, partitioning, and EXPLAIN ANALYZE.Robert Haas
Add a new EState member es_leaf_result_relations, so that the trigger code knows about ResultRelInfos created by tuple routing. Also make sure ExplainPrintTriggers knows about partition-related ResultRelInfos. Etsuro Fujita, reviewed by Amit Langote Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp
2017-07-30Move ExecProcNode from dispatch to function pointer based model.Andres Freund
This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-07-17Use a real RT index when setting up partition tuple routing.Robert Haas
Before, we always used a dummy value of 1, but that's not right when the partitioned table being modified is inside of a WITH clause rather than part of the main query. Amit Langote, reported and reviewd by Etsuro Fujita, with a comment change by me. Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp
2017-07-14Code review for NextValueExpr expression node type.Tom Lane
Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there *might* be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit 0bb51aa96. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us