summaryrefslogtreecommitdiff
path: root/src/backend/executor
AgeCommit message (Collapse)Author
2016-02-17Reuse abbreviated keys in ordered [set] aggregates.Robert Haas
When processing ordered aggregates following a sort that could make use of the abbreviated key optimization, only call the equality operator to compare successive pairs of tuples when their abbreviated keys were not equal. Peter Geoghegan, reviewd by Andreas Karlsson and by me.
2016-02-07ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable.Tom Lane
Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.
2016-02-06Fix comment block trashed by pgindent.Tom Lane
Looks like I put the protective dashes in the wrong place in f4e4b32743.
2016-02-06Improve HJDEBUG code a bit.Tom Lane
Commit 30d7ae3c76d2de144232ae6ab328ca86b70e72c3 introduced an HJDEBUG stanza that probably didn't compile at the time, and definitely doesn't compile now, because it refers to a nonexistent variable. It doesn't seem terribly useful anyway, so just get rid of it. While I'm fooling with it, use %z modifier instead of the obsolete hack of casting size_t to unsigned long, and include the HashJoinTable's address in each printout so that it's possible to distinguish the activities of multiple hashjoins occurring in one query. Noted while trying to use HJDEBUG to investigate bug #13908. Back-patch to 9.5, because code that doesn't compile is certainly not very helpful.
2016-02-04When modifying a foreign table, initialize tableoid field properly.Robert Haas
Failure to do this can cause AFTER ROW triggers or RETURNING expressions that reference this field to misbehave. Etsuro Fujita, reviewed by Thom Brown
2016-02-03Allow parallel custom and foreign scans.Robert Haas
This patch doesn't put the new infrastructure to use anywhere, and indeed it's not clear how it could ever be used for something like postgres_fdw which has to send an SQL query and wait for a reply, but there might be FDWs or custom scan providers that are CPU-bound, so let's give them a way to join club parallel. KaiGai Kohei, reviewed by me.
2016-01-28Only try to push down foreign joins if the user mapping OIDs match.Robert Haas
Previously, the foreign join pushdown infrastructure left the question of security entirely up to individual FDWs, but it would be easy for a foreign data wrapper to inadvertently open up subtle security holes that way. So, make it the core code's job to determine which user mapping OID is relevant, and don't attempt join pushdown unless it's the same for all relevant relations. Per a suggestion from Tom Lane. Shigeru Hanada and Ashutosh Bapat, reviewed by Etsuro Fujita and KaiGai Kohei, with some further changes by me.
2016-01-20Support parallel joins, and make related improvements.Robert Haas
The core innovation of this patch is the introduction of the concept of a partial path; that is, a path which if executed in parallel will generate a subset of the output rows in each process. Gathering a partial path produces an ordinary (complete) path. This allows us to generate paths for parallel joins by joining a partial path for one side (which at the baserel level is currently always a Partial Seq Scan) to an ordinary path on the other side. This is subject to various restrictions at present, especially that this strategy seems unlikely to be sensible for merge joins, so only nested loops and hash joins paths are generated. This also allows an Append node to be pushed below a Gather node in the case of a partitioned table. Testing revealed that early versions of this patch made poor decisions in some cases, which turned out to be caused by the fact that the original cost model for Parallel Seq Scan wasn't very good. So this patch tries to make some modest improvements in that area. There is much more to be done in the area of generating good parallel plans in all cases, but this seems like a useful step forward. Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20Support multi-stage aggregation.Robert Haas
Aggregate nodes now have two new modes: a "partial" mode where they output the unfinalized transition state, and a "finalize" mode where they accept unfinalized transition states rather than individual values as input. These new modes are not used anywhere yet, but they will be necessary for parallel aggregation. The infrastructure also figures to be useful for cases where we want to aggregate local data and remote data via the FDW interface, and want to bring back partial aggregates from the remote side that can then be combined with locally generated partial aggregates to produce the final value. It may also be useful even when neither FDWs nor parallelism are in play, as explained in the comments in nodeAgg.c. David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki Linnakangas, Haribabu Kommi, and me.
2016-01-17Restructure index access method API to hide most of it at the C level.Tom Lane
This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr JelĂ­nek, and rather heavily editorialized on by me.
2016-01-14Fix spelling mistakes.Robert Haas
Same patch submitted independently by David Rowley and Peter Geoghegan.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-12-23Read from the same worker repeatedly until it returns no tuple.Robert Haas
The original coding read tuples from workers in round-robin fashion, but performance testing shows that it works much better to read enough to empty one queue before moving on to the next. I believe the reason for this is that, with the old approach, we could easily wake up a worker repeatedly to write only one new tuple into the shm_mq each time. With this approach, by the time the process gets scheduled, it has a decent chance of being able to fill the entire buffer in one go. Patch by me. Dilip Kumar helped with performance testing.
2015-12-22Allow omitting one or both boundaries in an array slice specifier.Tom Lane
Omitted boundaries represent the upper or lower limit of the corresponding array subscript. This allows simpler specification of many common use-cases. (Revised version of commit 9246af6799819847faa33baf441251003acbb8fe) YUriy Zhuravlev
2015-12-18Revert 9246af6799819847faa33baf441251003acbb8fe becauseTeodor Sigaev
I miss too much. Patch is returned to commitfest process.
2015-12-18Fix TupleQueueReaderNext not to ignore its nowait argument.Robert Haas
This was a silly goof on my (rhaas's) part. Report and fix by Rushabh Lathia.
2015-12-18Allow to omit boundaries in array subscriptTeodor Sigaev
Allow to omiy lower or upper or both boundaries in array subscript for selecting slice of array. Author: YUriy Zhuravlev
2015-12-10Improve some messagesPeter Eisentraut
2015-12-10Fix ON CONFLICT UPDATE bug breaking AFTER UPDATE triggers.Andres Freund
ExecOnConflictUpdate() passed t_ctid of the to-be-updated tuple to ExecUpdate(). That's problematic primarily because of two reason: First and foremost t_ctid could point to a different tuple. Secondly, and that's what triggered the complaint by Stanislav, t_ctid is changed by heap_update() to point to the new tuple version. The behavior of AFTER UPDATE triggers was therefore broken, with NEW.* and OLD.* tuples spuriously identical within AFTER UPDATE triggers. To fix both issues, pass a pointer to t_self of a on-stack HeapTuple instead. Fixing this bug lead to one change in regression tests, which previously failed due to the first issue mentioned above. There's a reasonable expectation that test fails, as it updates one row repeatedly within one INSERT ... ON CONFLICT statement. That is only possible if the second update is triggered via ON CONFLICT ... SET, ON CONFLICT ... WHERE, or by a WITH CHECK expression, as those are executed after ExecOnConflictUpdate() does a visibility check. That could easily be prohibited, but given it's allowed for plain UPDATEs and a rare corner case, it doesn't seem worthwhile. Reported-By: Stanislav Grozev Author: Andres Freund and Peter Geoghegan Discussion: CAA78GVqy1+LisN-8DygekD_Ldfy=BJLarSpjGhytOsgkpMavfQ@mail.gmail.com Backpatch: 9.5, where ON CONFLICT was introduced
2015-12-09Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.Robert Haas
The original parallel sequential scan commit included only very limited changes to the EXPLAIN output. Aggregated totals from all workers were displayed, but there was no way to see what each individual worker did or to distinguish the effort made by the workers from the effort made by the leader. Per a gripe by Thom Brown (and maybe others). Patch by me, reviewed by Amit Kapila.
2015-12-08Allow foreign and custom joins to handle EvalPlanQual rechecks.Robert Haas
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic infrastructure for allowing a foreign data wrapper or custom scan provider to replace a join of one or more tables with a scan. However, this infrastructure failed to take into account the need for possible EvalPlanQual rechecks, and ExecScanFetch would fail an assertion (or just overwrite memory) if such a check was attempted for a plan containing a pushed-down join. To fix, adjust the EPQ machinery to skip some processing steps when scanrelid == 0, making those the responsibility of scan's recheck method, which also has the responsibility in this case of correctly populating the relevant slot. To allow foreign scans to gain control in the right place to make use of this new facility, add a new, optional RecheckForeignScan method. Also, allow a foreign scan to have a child plan, which can be used to correctly populate the slot (or perhaps for something else, but this is the only use currently envisioned). KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro Horiguchi.
2015-11-30Fix obsolete comment.Robert Haas
It's amazing how fast things become obsolete these days. Amit Langote
2015-11-20Avoid server crash when worker registration fails at execution time.Robert Haas
The previous coding attempts to destroy the DSM in this case, but child nodes might have stored data there and still be holding onto pointers in this case. So don't do that. Also, free the reader array instead of leaking it. Extracted from two different patch versions both by Amit Kapila.
2015-11-18Avoid aggregating worker instrumentation multiple times.Robert Haas
Amit Kapila, per design ideas from me.
2015-11-18Fix dumb bug in tqueue.cRobert Haas
When I wrote this code originally, the intention was to recompute the remapinfo only when the tupledesc changes. This presumably only happens once per query, but I copied the design pattern from other DestReceivers. However, due to a silly oversight on my part, tqueue->tupledesc never got set, leading to recomputation for every tuple. This should improve the performance of parallel scans that return a significant number of tuples. Report by Amit Kapila; patch by me, reviewed by him.
2015-11-15Remove accidentally-committed debugging code.Robert Haas
Amit Kapila
2015-11-11Make sequential scans parallel-aware.Robert Haas
In addition, this path fills in a number of missing bits and pieces in the parallel infrastructure. Paths and plans now have a parallel_aware flag indicating whether whatever parallel-aware logic they have should be engaged. It is believed that we will need this flag for a number of path/plan types, not just sequential scans, which is why the flag is generic rather than part of the SeqScan structures specifically. Also, execParallel.c now gives parallel nodes a chance to initialize their PlanState nodes from the DSM during parallel worker startup. Amit Kapila, with a fair amount of adjustment by me. Review of previous patch versions by Haribabu Kommi and others.
2015-11-10Add missing "static" qualifier.Tom Lane
Per buildfarm member pademelon.
2015-11-09Fix rebasing mistake in nodeGather.cRobert Haas
The patches committed as 6e71dd7ce9766582da453f493bc371d64977282f and 3a1f8611f2582df0a16bcd35caed2e1526387643 were developed in parallel but dependent on each other in a way that I failed to notice. This patch to fix the problem was prepared by Amit Kapila.
2015-11-09Add a dummy return statement to TupleQueueRemap.Robert Haas
This is unreachable for multiple reasons, but per Amit Kapila the Windows compiler he is using still thinks we can get there.
2015-11-07Remove set-but-not-used variables.Robert Haas
Reported by both Peter Eisentraunt and Kevin Grittner.
2015-11-06Try to convince gcc that TupleQueueRemap never falls off the end.Robert Haas
Without this, MacOS gcc version 4.2.1 isn't convinced.
2015-11-06Modify tqueue infrastructure to support transient record types.Robert Haas
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this mechanism, failed to account for the fact that the RECORD pseudo-type uses transient typmods that are only meaningful within a single backend. Transferring such tuples without modification between two cooperating backends does not work. This commit installs a system for passing the tuple descriptors over the same shm_mq being used to send the tuples themselves. The two sides might not assign the same transient typmod to any given tuple descriptor, so we must also substitute the appropriate receiver-side typmod for the one used by the sender. That adds some CPU overhead, but still seems better than being unable to pass records between cooperating parallel processes. Along the way, move the logic for handling multiple tuple queues from tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader, which reads from a single queue, rather than a TupleQueueFunnel, which potentially reads from multiple queues. This change was suggested previously as a way to make sure that nodeGather.c rather than tqueue.c had policy control over the order in which to read from queues, but it wasn't clear to me until now how good an idea it was. typmod mapping needs to be performed separately for each queue, and it is much simpler if the tqueue.c code handles that and leaves multiplexing multiple queues to higher layers of the stack.
2015-11-02Fix problems with ParamListInfo serialization mechanism.Robert Haas
Commit d1b7c1ffe72e86932b5395f29e006c3f503bc53d introduced a mechanism for serializing a ParamListInfo structure to be passed to a parallel worker. However, this mechanism failed to handle external expanded values, as pointed out by Noah Misch. Repair. Moreover, plpgsql_param_fetch requires adjustment because the serialization mechanism needs it to skip evaluating unused parameters just as we would do when it is called from copyParamList, but params == estate->paramLI in that case. To fix, make the bms_is_member test in that function unconditional. Finally, have setup_param_list set a new ParamListInfo field, paramMask, to the parameters actually used in the expression, so that we don't try to fetch those that are not needed when serializing a parameter list. This isn't necessary for correctness, but it makes the performance of the parallel executor code comparable to what we do for cases involving cursors. Design suggestions and extensive review by Noah Misch. Patch by me.
2015-10-30Update parallel executor support to reuse the same DSM.Robert Haas
Commit b0b0d84b3d663a148022e900ebfc164284a95f55 purported to make it possible to relaunch workers using the same parallel context, but it had an unpleasant race condition: we might reinitialize after the workers have sent their last control message but before they have dettached the DSM, leaving to crashes. Repair by introducing a new ParallelContext operation, ReinitializeParallelDSM. Adjust execParallel.c to use this new support, so that we can rescan a Gather node by relaunching workers but without needing to recreate the DSM. Amit Kapila, with some adjustments by me. Extracted from latest parallel sequential scan patch.
2015-10-28Message style improvementsPeter Eisentraut
Message style, plurals, quoting, spelling, consistency with similar messages
2015-10-28Make Gather node projection-capable.Robert Haas
The original Gather code failed to mark a Gather node as not able to do projection, but it couldn't, even though it did call initialize its projection info via ExecAssignProjectionInfo. There doesn't seem to be any good reason for this node not to have projection capability, so clean things up so that it does. Without this, plans using Gather nodes might need to carry extra Result nodes to do projection.
2015-10-22Fix typos in comments.Robert Haas
CharSyam
2015-10-22Fix a couple of bugs in recent parallelism-related commits.Robert Haas
Commit 816e336f12ecabdc834d4cc31bcf966b2dd323dc added the wrong error check to async.c; sending restrictions is restricted to the leader, not altogether unsafe. Commit 3bd909b220930f21d6e15833a17947be749e7fde added ExecShutdownNode to traverse the planstate tree and call shutdown functions, but made a Gather node, the only node that actually has such a function, abort the tree traversal, which is wrong.
2015-10-22Add header comments to execParallel.c and nodeGather.c.Robert Haas
Patch by me, per a note from Simon Riggs. Reviewed by Amit Kapila and Amit Langote.
2015-10-20Remove duplicate word.Robert Haas
Amit Langote
2015-10-16Rewrite interaction of parallel mode with parallel executor support.Robert Haas
In the previous coding, before returning from ExecutorRun, we'd shut down all parallel workers. This was dead wrong if ExecutorRun was called with a non-zero tuple count; it had the effect of truncating the query output. To fix, give ExecutePlan control over whether to enter parallel mode, and have it refuse to do so if the tuple count is non-zero. Rewrite the Gather logic so that it can cope with being called outside parallel mode. Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b is largely to blame for this problem, though this patch modifies some subsequently-committed code which relied on the guarantees it purported to make.
2015-10-15Allow FDWs to push down quals without breaking EvalPlanQual rechecks.Robert Haas
This fixes a long-standing bug which was discovered while investigating the interaction between the new join pushdown code and the EvalPlanQual machinery: if a ForeignScan appears on the inner side of a paramaterized nestloop, an EPQ recheck would re-return the original tuple even if it no longer satisfied the pushed-down quals due to changed parameter values. This fix adds a new member to ForeignScan and ForeignScanState and a new argument to make_foreignscan, and requires changes to FDWs which push down quals to populate that new argument with a list of quals they have chosen to push down. Therefore, I'm only back-patching to 9.5, even though the bug is not new in 9.5. Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
2015-10-13Improve INSERT .. ON CONFLICT error message.Robert Haas
Peter Geoghegan, reviewed by me.
2015-10-04Further twiddling of nodeHash.c hashtable sizing calculation.Tom Lane
On reflection, the submitted patch didn't really work to prevent the request size from exceeding MaxAllocSize, because of the fact that we'd happily round nbuckets up to the next power of 2 after we'd limited it to max_pointers. The simplest way to enforce the limit correctly is to round max_pointers down to a power of 2 when it isn't one already. (Note that the constraint to INT_MAX / 2, if it were doing anything useful at all, is properly applied after that.)
2015-10-04Fix some issues in new hashtable size calculations in nodeHash.c.Tom Lane
Limit the size of the hashtable pointer array to not more than MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory alloc request size" failures. There was discussion of allowing the array to get larger than that by using the "huge" palloc API, but so far no proof that that is actually a good idea, and at this point in the 9.5 cycle major changes from old behavior don't seem like the way to go. Fix a rather serious secondary bug in the new code, which was that it didn't ensure nbuckets remained a power of 2 when recomputing it for the multiple-batch case. Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and its sole call site.
2015-10-03Add missing "static" specifier.Tom Lane
Per buildfarm (pademelon, at least, doesn't like this).
2015-09-30Add a Gather executor node.Robert Haas
A Gather executor node runs any number of copies of a plan in an equal number of workers and merges all of the results into a single tuple stream. It can also run the plan itself, if the workers are unavailable or haven't started up yet. It is intended to work with the Partial Seq Scan node which will be added in future commits. It could also be used to implement parallel query of a different sort by itself, without help from Partial Seq Scan, if the single_copy mode is used. In that mode, a worker executes the plan, and the parallel leader does not, merely collecting the worker's results. So, a Gather node could be inserted into a plan to split the execution of that plan across two processes. Nested Gather nodes aren't currently supported, but we might want to add support for that in the future. There's nothing in the planner to actually generate Gather nodes yet, so it's not quite time to break out the champagne. But we're getting close. Amit Kapila. Some designs suggestions were provided by me, and I also reviewed the patch. Single-copy mode, documentation, and other minor changes also by me.
2015-09-28Parallel executor support.Robert Haas
This code provides infrastructure for a parallel leader to start up parallel workers to execute subtrees of the plan tree being executed in the master. User-supplied parameters from ParamListInfo are passed down, but PARAM_EXEC parameters are not. Various other constructs, such as initplans, subplans, and CTEs, are also not currently shared. Nevertheless, there's enough here to support a basic implementation of parallel query, and we can lift some of the current restrictions as needed. Amit Kapila and Robert Haas
2015-09-28Fix ON CONFLICT DO UPDATE for tables with oids.Andres Freund
When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables with oids were not supported. The tuple generated by the update target list was projected without space for an oid - a simple oversight. Reported-By: Peter Geoghegan Author: Andres Freund Backpatch: 9.5, where ON CONFLICT was introduced