summaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util/pathnode.c
AgeCommit message (Collapse)Author
2018-03-20Don't pass the grouping target around unnecessarily.Robert Haas
Since commit 4f15e5d09de276fb77326be5567dd9796008ca2e made grouped_rel set reltarget, a variety of other functions can just get it from grouped_rel instead of having to pass it around explicitly. Simplify accordingly. Patch by me, reviewed by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com
2018-01-23Teach reparameterize_path() to handle AppendPaths.Tom Lane
If we're inside a lateral subquery, there may be no unparameterized paths for a particular child relation of an appendrel, in which case we *must* be able to create similarly-parameterized paths for each other child relation, else the planner will fail with "could not devise a query plan for the given query". This means that there are situations where we'd better be able to reparameterize at least one path for each child. This calls into question the assumption in reparameterize_path() that it can just punt if it feels like it. However, the only case that is known broken right now is where the child is itself an appendrel so that all its paths are AppendPaths. (I think possibly I disregarded that in the original coding on the theory that nested appendrels would get folded together --- but that only happens *after* reparameterize_path(), so it's not excused from handling a child AppendPath.) Given that this code's been like this since 9.3 when LATERAL was introduced, it seems likely we'd have heard of other cases by now if there were a larger problem. Per report from Elvis Pranskevichus. Back-patch to 9.3. Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
2018-01-19Allow UPDATE to move rows between partitions.Robert Haas
When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-17Reorder C includesBruce Momjian
Reorder header files in joinrels.c and pathnode.c in alphabetical order, removing unnecessary ones. Author: Etsuro Fujita
2018-01-09Improve the heuristic for ordering child paths of a parallel append.Tom Lane
Commit ab7271677 introduced code that attempts to order the child scans of a Parallel Append node in a way that will minimize execution time, based on total cost and startup cost. However, it failed to think hard about what to do when estimated costs are exactly equal; a case that's particularly likely to occur when comparing on startup cost. In such a case the ordering of the child paths would be left to the whims of qsort, an algorithm that isn't even stable. We can improve matters by applying the rule used elsewhere in the planner: if total costs are equal, sort on startup cost, and vice versa. When both cost estimates are exactly equal, rather than letting qsort do something unpredictable, sort based on the child paths' relids, which should typically result in sorting in inheritance order. (The latter provision requires inventing a qsort-style comparator for bitmapsets, but maybe we'll have use for that for other reasons in future.) This results in a few plan changes in the select_parallel test, but those all look more reasonable than before, when the actual underlying cost numbers are taken into account. Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-12-21Add parallel-aware hash joins.Andres Freund
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-05Support Parallel Append plan nodes.Robert Haas
When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-11-30Make create_unique_path manage memory like mark_dummy_rel.Robert Haas
Put the unique path in the same context as the owning RelOptInfo, rather than the toplevel planner context. This is how this function worked originally, but commit f41803bb39bc2949db200116a609fd242d0ec221 changed it without explanation. mark_dummy_rel adopted the older (or newer?) technique in commit eca75a12a27d28b972fc269c1c8813cd8eb15441, which also featured a much better explanation of why it is correct. So, switch back to that technique here, with the same explanation given there. Although this fixes a possible memory leak when GEQO is in use, the leak is minor and probably nobody cares, so no back-patch. Ashutosh Bapat, reviewed by Tom Lane and by me Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com
2017-11-29Update typedefs.list and re-run pgindentRobert Haas
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-13Push target list evaluation through Gather Merge.Robert Haas
We already do this for Gather, but it got overlooked for Gather Merge. Amit Kapila, with review and minor revisions by Rushabh Lathia and by me. Discussion: http://postgr.es/m/CAA4eK1KUC5Uyu7qaifxrjpHxbSeoQh3yzwN3bThnJsmJcZ-qtA@mail.gmail.com
2017-11-02Teach planner to account for HAVING quals in aggregation plan nodes.Tom Lane
For some reason, we have never accounted for either the evaluation cost or the selectivity of filter conditions attached to Agg and Group nodes (which, in practice, are always conditions from a HAVING clause). Applying our regular selectivity logic to post-grouping conditions is a bit bogus, but it's surely better than taking the selectivity as 1.0. Perhaps someday the extended-statistics mechanism can be taught to provide statistics that would help us in getting non-default estimates here. Per a gripe from Benjamin Coutu. This is surely a bug fix, but I'm hesitant to back-patch because of the prospect of destabilizing existing plan choices. Given that it took us this long to notice the bug, it's probably not hurting too many people in the field. Discussion: https://postgr.es/m/20968.1509486337@sss.pgh.pa.us
2017-10-06Basic partition-wise join functionality.Robert Haas
Instead of joining two partitioned tables in their entirety we can, if it is an equi-join on the partition keys, join the matching partitions individually. This involves teaching the planner about "other join" rels, which are related to regular join rels in the same way that other member rels are related to baserels. This can use significantly more CPU time and memory than regular join planning, because there may now be a set of "other" rels not only for every base relation but also for every join relation. In most practical cases, this probably shouldn't be a problem, because (1) it's probably unusual to join many tables each with many partitions using the partition keys for all joins and (2) if you do that scenario then you probably have a big enough machine to handle the increased memory cost of planning and (3) the resulting plan is highly likely to be better, so what you spend in planning you'll make up on the execution side. All the same, for now, turn this feature off by default. Currently, we can only perform joins between two tables whose partitioning schemes are absolutely identical. It would be nice to cope with other scenarios, such as extra partitions on one side or the other with no match on the other side, but that will have to wait for a future patch. Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit Khandekar, and by me. A few final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-08-15Assorted preparatory refactoring for partition-wise join.Robert Haas
Instead of duplicating the logic to search for a matching ParamPathInfo in multiple places, factor it out into a separate function. Pass only the relevant bits of the PartitionKey to partition_bounds_equal instead of the whole thing, because partition-wise join will want to call this without having a PartitionKey available. Adjust allow_star_schema_join and calc_nestloop_required_outer to take relevant Relids rather than the entire Path, because partition-wise join will want to call it with the top-parent relids to determine whether a child join is allowable. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-06-22Document partitioned_rels in create_modifytable_path header comment.Robert Haas
Etsuro Fujita, slightly adjusted by me. Discussion: http://postgr.es/m/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1@lab.ntt.co.jp
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-05-19Copy partitioned_rels lists to avoid shared substructure.Robert Haas
Otherwise, set_plan_refs() can get applied to the same list multiple times through different references, leading to chaos. Amit Langote, Dilip Kumar, and Robert Haas, reviewed by Ashutosh Bapat. Original report by Sveinn Sveinsson. Discussion: http://postgr.es/m/20170517141151.1435.79890@wrigleys.postgresql.org
2017-05-17Post-PG 10 beta1 pgindent runBruce Momjian
perltidy run not included.
2017-04-07Optimize joins when the inner relation can be proven unique.Tom Lane
If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
2017-03-31Add infrastructure to support EphemeralNamedRelation references.Kevin Grittner
A QueryEnvironment concept is added, which allows new types of objects to be passed into queries from parsing on through execution. At this point, the only thing implemented is a collection of EphemeralNamedRelation objects -- relations which can be referenced by name in queries, but do not exist in the catalogs. The only type of ENR implemented is NamedTuplestore, but provision is made to add more types fairly easily. An ENR can carry its own TupleDesc or reference a relation in the catalogs by relid. Although these features can be used without SPI, convenience functions are added to SPI so that ENRs can easily be used by code run through SPI. The initial use of all this is going to be transition tables in AFTER triggers, but that will be added to each PL as a separate commit. An incidental effect of this patch is to produce a more informative error message if an attempt is made to modify the contents of a CTE from a referencing DML statement. No tests previously covered that possibility, so one is added. Kevin Grittner and Thomas Munro Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro with valuable comments and suggestions from many others
2017-03-31Fix parallel query so it doesn't spoil row estimates above Gather.Robert Haas
Commit 45be99f8cd5d606086e0a458c9c72910ba8a613d removed GatherPath's num_workers field, but this is entirely bogus. Normally, a path's parallel_workers flag is supposed to indicate the number of workers that it wants, and should be 0 for a non-partial path. In that commit, I mistakenly thought that GatherPath could also use that field to indicate the number of workers that it would try to start, but that's disastrous, because then it can propagate up to higher nodes in the plan tree, which will then get incorrect rowcounts because the parallel_workers flag is involved in computing those values. Repair by putting the separate field back. Report by Tomas Vondra. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
2017-03-27Support hashed aggregation with grouping sets.Andrew Gierth
This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
2017-03-21Don't scan partitioned tables.Robert Haas
Partitioned tables do not contain any data; only their unpartitioned descendents need to be scanned. However, the partitioned tables still need to be locked, even though they're not scanned. To make that work, Append and MergeAppend relations now need to carry a list of (unscanned) partitioned relations that must be locked, and InitPlan must lock all partitioned result relations. Aside from the obvious advantage of avoiding some work at execution time, this has two other advantages. First, it may improve the planner's decision-making in some cases since the empty relation might throw things off. Second, it paves the way to getting rid of the storage for partitioned tables altogether. Amit Langote, reviewed by me. Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
2017-03-09Add a Gather Merge executor node.Robert Haas
Like Gather, we spawn multiple workers and run the same plan in each one; however, Gather Merge is used when each worker produces the same output ordering and we want to preserve that output ordering while merging together the streams of tuples from various workers. (In a way, Gather Merge is like a hybrid of Gather and MergeAppend.) This works out to a win if it saves us from having to perform an expensive Sort. In cases where only a small amount of data would need to be sorted, it may actually be faster to use a regular Gather node and then sort the results afterward, because Gather Merge sometimes needs to wait synchronously for tuples whereas a pure Gather generally doesn't. But if this avoids an expensive sort then it's a win. Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro, and Neha Sharma, and reviewed and revised by me. Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-08Support parallel bitmap heap scans.Robert Haas
The index is scanned by a single process, but then all cooperating processes can iterate jointly over the resulting set of heap blocks. In the future, we might also want to support using a parallel bitmap index scan to set up for a parallel bitmap heap scan, but that's a job for another day. Dilip Kumar, with some corrections and cosmetic changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, Thomas Munro, and me. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08Support XMLTABLE query expressionAlvaro Herrera
XMLTABLE is defined by the SQL/XML standard as a feature that allows turning XML-formatted data into relational form, so that it can be used as a <table primary> in the FROM clause of a query. This new construct provides significant simplicity and performance benefit for XML data processing; what in a client-side custom implementation was reported to take 20 minutes can be executed in 400ms using XMLTABLE. (The same functionality was said to take 10 seconds using nested PostgreSQL XPath function calls, and 5 seconds using XMLReader under PL/Python). The implemented syntax deviates slightly from what the standard requires. First, the standard indicates that the PASSING clause is optional and that multiple XML input documents may be given to it; we make it mandatory and accept a single document only. Second, we don't currently support a default namespace to be specified. This implementation relies on a new executor node based on a hardcoded method table. (Because the grammar is fixed, there is no extensibility in the current approach; further constructs can be implemented on top of this such as JSON_TABLE, but they require changes to core code.) Author: Pavel Stehule, Álvaro Herrera Extensively reviewed by: Craig Ringer Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-02-15Add optimizer and executor support for parallel index scans.Robert Haas
In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.
2017-01-18Move targetlist SRF handling from expression evaluation to new executor node.Andres Freund
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-11-19Fix latent costing error in create_merge_append_path.Tom Lane
create_merge_append_path should use the path rowcount it just computed, not rel->tuples, for costing purposes. Those numbers should always be the same at present, but if we ever support parameterized MergeAppend paths (a case this function is otherwise prepared for), the former would be right and the latter wrong. No need for back-patch since the problem is only latent. Ashutosh Bapat Discussion: <CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com>
2016-08-19Speed up planner's scanning for parallel-query hazards.Tom Lane
We need to scan the whole parse tree for parallel-unsafe functions. If there are none, we'll later need to determine whether particular subtrees contain any parallel-restricted functions. The previous coding retained no knowledge from the first scan, even though this is very wasteful in the common case where the query contains only parallel-safe functions. We can bypass all of the later scans by remembering that fact. This provides a small but measurable speed improvement when the case applies, and shouldn't cost anything when it doesn't. Patch by me, reviewed by Robert Haas Discussion: <3740.1471538387@sss.pgh.pa.us>
2016-07-01Set consider_parallel correctly for upper planner rels.Robert Haas
Commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f introduced new "upper" RelOptInfo structures but didn't set consider_parallel for them correctly, a point I completely missed when reviewing it. Later, commit e06a38965b3bcdaa881e7e06892d4d8ab6c2c980 made the situation worse by doing it incorrectly for the grouping relation. Try to straighten all of that out. Along the way, get rid of the annoying wholePlanParallelSafe flag, which was only necessarily because of the fact that upper planning stages didn't use paths at the time that code was written. The most important immediate impact of these changes is that force_parallel_mode will provide useful test coverage in quite a few more scenarios than it did previously, but it's also necessary preparation for fixing some problems related to subqueries. Patch by me, reviewed by Tom Lane.
2016-06-26Rethink node-level representation of partial-aggregation modes.Tom Lane
The original coding had three separate booleans representing partial aggregation behavior, which was confusing, unreadable, and error-prone, not least because the booleans weren't always listed in the same order. It was also inadequate for the allegedly-desirable future extension to support intermediate partial aggregation, because we'd need separate markers for serialization and deserialization in such a case. Merge these bools into an enum "AggSplit" to provide symbolic names for the supported operating modes (and document what those are). By assigning the values of the enum constants carefully, we can treat AggSplit values as options bitmasks so that tests of what to do aren't noticeably more expensive than before. While at it, get rid of Aggref.aggoutputtype. That's not needed since commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison code, and it likewise seemed more confusing than helpful. Assorted comment cleanup as well (there's still more that I want to do in that line). catversion bump for change in Aggref node contents. Should be the last one for partial-aggregation changes. Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-21Refactor planning of projection steps that don't need a Result plan node.Tom Lane
The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4) assumed that we could always determine during Path formation whether or not we would need a Result plan node to perform projection of a targetlist. That turns out not to work very well, though, because createplan.c still has some responsibilities for choosing the specific target list associated with sorting/grouping nodes (in particular it might choose to add resjunk columns for sorting). We might not ever refactor that --- doing so would push more work into Path formation, which isn't attractive --- and we certainly won't do so for 9.6. So, while create_projection_path and apply_projection_to_path can tell for sure what will happen if the subpath is projection-capable, they can't tell for sure when it isn't. This is at least a latent bug in apply_projection_to_path, which might think it can apply a target to a non-projecting node when the node will end up computing something different. Also, I'd tied the creation of a ProjectionPath node to whether or not a Result is needed, but it turns out that we sometimes need a ProjectionPath node anyway to avoid modifying a possibly-shared subpath node. Callers had to use create_projection_path for such cases, and we added code to them that knew about the potential omission of a Result node and attempted to adjust the cost estimates for that. That was uncertainly correct and definitely ugly/unmaintainable. To fix, have create_projection_path explicitly check whether a Result is needed and adjust its cost estimate accordingly, though it creates a ProjectionPath in either case. apply_projection_to_path is now mostly just an optimized version that can avoid creating an extra Path node when the input is known to not be shared with any other live path. (There is one case that create_projection_path doesn't handle, which is pushing parallel-safe expressions below a Gather node. We could make it do that by duplicating the GatherPath, but there seems no need as yet.) create_projection_plan still has to recheck the tlist-match condition, which means that if the matching situation does get changed by createplan.c then we'll have made a slightly incorrect cost estimate. But there seems no help for that in the near term, and I doubt it occurs often enough, let alone would change planning decisions often enough, to be worth stressing about. I added a "dummypp" field to ProjectionPath to track whether create_projection_path thinks a Result is needed. This is not really necessary as-committed because create_projection_plan doesn't look at the flag; but it seems like a good idea to remember what we thought when forming the cost estimate, if only for debugging purposes. In passing, get rid of the target_parallel parameter added to apply_projection_to_path by commit 54f5c5150. I don't think that's a good idea because it involves callers in what should be an internal decision, and opens us up to missing optimization opportunities if callers think they don't need to provide a valid flag, as most don't. For the moment, this just costs us an extra has_parallel_hazard call when planning a Gather. If that starts to look expensive, I think a better solution would be to teach PathTarget to carry/cache knowledge of parallel-safety of its contents.
2016-06-17Try again to fix the way the scanjoin_target is used with partial paths.Robert Haas
Commit 04ae11f62e643e07c411c4935ea6af46cb112aa9 removed some broken code to apply the scan/join target to partial paths, but its theory that this processing step is totally unnecessary turns out to be wrong. Put similar code back again, but this time, check for parallel-safety and avoid in-place modifications to paths that may already have been used as part of some other path. (This is not an entirely elegant solution to this problem; it might be better, for example, to postpone generate_gather_paths for the topmost scan/join rel until after the scan/join target has been applied. But this is not the time for such redesign work.) Amit Kapila and Robert Haas
2016-06-09Eliminate "parallel degree" terminology.Robert Haas
This terminology provoked widespread complaints. So, instead, rename the GUC max_parallel_degree to max_parallel_workers_per_gather (leaving room for a possible future GUC max_parallel_workers that acts as a system-wide limit), and rename the parallel_degree reloption to parallel_workers. Rename structure members to match. These changes create a dump/restore hazard for users of PostgreSQL 9.6beta1 who have set the reloption (or applied the GUC using ALTER USER or ALTER DATABASE).
2016-04-30Fix planner crash from pfree'ing a partial path that a GatherPath uses.Tom Lane
We mustn't run generate_gather_paths() during add_paths_to_joinrel(), because that function can be invoked multiple times for the same target joinrel. Not only is it wasteful to build GatherPaths repeatedly, but a later add_partial_path() could delete the partial path that a previously created GatherPath depends on. Instead establish the convention that we do generate_gather_paths() for a rel only just before set_cheapest(). The code was accidentally not broken for baserels, because as of today there never is more than one partial path for a baserel. But that assumption obviously has a pretty short half-life, so move the generate_gather_paths() calls for those cases as well. Also add some generic comments explaining how and why this all works. Per fuzz testing by Andreas Seltenreich. Report: <871t5pgwdt.fsf@credativ.de>
2016-03-29Allow aggregate transition states to be serialized and deserialized.Robert Haas
This is necessary infrastructure for supporting parallel aggregation for aggregates whose transition type is "internal". Such values can't be passed between cooperating processes, because they are just pointers. David Rowley, reviewed by Tomas Vondra and by me.
2016-03-21Support parallel aggregation.Robert Haas
Parallel workers can now partially aggregate the data and pass the transition values back to the leader, which can combine the partial results to produce the final answer. David Rowley, based on earlier work by Haribabu Kommi. Reviewed by Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
2016-03-18Push scan/join target list beneath Gather when possible.Robert Haas
This means that, for example, "SELECT expensive_func(a) FROM bigtab WHERE something" can compute expensive_func(a) in the workers rather than the leader if it happens to be parallel-safe, which figures to be a big win in some practical cases. Currently, we can only do this if the entire target list is parallel-safe. If we worked harder, we might be able to evaluate parallel-safe targets in the worker and any parallel-restricted targets in the leader, but that would be more complicated, and there aren't that many parallel-restricted functions that people are likely to use in queries anyway. I think. So just do the simple thing for the moment. Robert Haas, Amit Kapila, and Tom Lane
2016-03-14Add a GetForeignUpperPaths callback function for FDWs.Tom Lane
This is basically like the just-added create_upper_paths_hook, but control is funneled only to the FDW responsible for all the baserels of the current query; so providing such a callback is much less likely to add useless overhead than using the hook function is. The documentation is a bit sketchy. We'll likely want to improve it, and/or adjust the call conventions, when we get some experience with actually using this callback. Hopefully somebody will find time to experiment with it before 9.6 feature freeze.
2016-03-14Allow callers of create_foreignscan_path to specify nondefault PathTarget.Tom Lane
Although the default choice of rel->reltarget should typically be sufficient for scan or join paths, it's not at all sufficient for the purposes PathTargets were invented for; in particular not for upper-relation Paths. So break API compatibility by adding a PathTarget argument to create_foreignscan_path(). To ease updating of existing code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14Rethink representation of PathTargets.Tom Lane
In commit 19a541143a09c067 I did not make PathTarget a subtype of Node, and embedded a RelOptInfo's reltarget directly into it rather than having a separately-allocated Node. In hindsight that was misguided micro-optimization, enabled by the fact that at that point we didn't have any Paths with custom PathTargets. Now that PathTarget processing has been fleshed out some more, it's easier to see that it's better to have PathTarget as an indepedent Node type, even if it does cost us one more palloc to create a RelOptInfo. So change it while we still can. This commit just changes the representation, without doing anything more interesting than that.
2016-03-08Improve handling of group-column indexes in GroupingSetsPath.Tom Lane
Instead of having planner.c compute a groupColIdx array and store it in GroupingSetsPaths, make create_groupingsets_plan() find the grouping columns by searching in the child plan node's tlist. Although that's probably a bit slower for create_groupingsets_plan(), it's more like the way every other plan node type does this, and it provides positive confirmation that we know which child output columns we're supposed to be grouping on. (Indeed, looking at this now, I'm not at all sure that it wasn't broken before, because create_groupingsets_plan() isn't demanding an exact tlist match from its child node.) Also, this allows substantial simplification in planner.c, because it no longer needs to compute the groupColIdx array at all; no other cases were using it. I'd intended to put off this refactoring until later (like 9.7), but in view of the likely bug fix and the need to rationalize planner.c's tlist handling so we can do something sane with Konstantin Knizhnik's function-evaluation-postponement patch, I think it can't wait.
2016-03-08Finish refactoring make_foo() functions in createplan.c.Tom Lane
This patch removes some redundant cost calculations that I left for later cleanup in commit 3fc6e2d7f5b652b4. There's now a uniform policy that the make_foo() convenience functions don't do any cost calculations. Most of their callers copy costs from the source Path node, and for those that don't, the calculation in the make_foo() function wasn't necessarily right anyhow. (make_result() was particularly a mess, as it was serving multiple callers using cost calcs designed for only the first one or two that had ever existed.) Aside from saving a few cycles, this ensures that what EXPLAIN prints matches the costs we used for planning purposes. It does not change any planner decisions, since the decisions are already made.
2016-03-07Make the upper part of the planner work by generating and comparing Paths.Tom Lane
I've been saying we needed to do this for more than five years, and here it finally is. This patch removes the ever-growing tangle of spaghetti logic that grouping_planner() used to use to try to identify the best plan for post-scan/join query steps. Now, there is (nearly) independent consideration of each execution step, and entirely separate construction of Paths to represent each of the possible ways to do that step. We choose the best Path or set of Paths using the same add_path() logic that's been used inside query_planner() for years. In addition, this patch removes the old restriction that subquery_planner() could return only a single Plan. It now returns a RelOptInfo containing a set of Paths, just as query_planner() does, and the parent query level can use each of those Paths as the basis of a SubqueryScanPath at its level. This allows finding some optimizations that we missed before, wherein a subquery was capable of returning presorted data and thereby avoiding a sort in the parent level, making the overall cost cheaper even though delivering sorted output was not the cheapest plan for the subquery in isolation. (A couple of regression test outputs change in consequence of that. However, there is very little change in visible planner behavior overall, because the point of this patch is not to get immediate planning benefits but to create the infrastructure for future improvements.) There is a great deal left to do here. This patch unblocks a lot of planner work that was basically impractical in the old code structure, such as allowing FDWs to implement remote aggregation, or rewriting plan_set_operations() to allow consideration of multiple implementation orders for set operations. (The latter will likely require a full rewrite of plan_set_operations(); what I've done here is only to fix it to return Paths not Plans.) I have also left unfinished some localized refactoring in createplan.c and planner.c, because it was not necessary to get this patch to a working state. Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-02-18Add an explicit representation of the output targetlist to Paths.Tom Lane
Up to now, there's been an assumption that all Paths for a given relation compute the same output column set (targetlist). However, there are good reasons to remove that assumption. For example, an indexscan on an expression index might be able to return the value of an expensive function "for free". While we have the ability to generate such a plan today in simple cases, we don't have a way to model that it's cheaper than a plan that computes the function from scratch, nor a way to create such a plan in join cases (where the function computation would normally happen at the topmost join node). Also, we need this so that we can have Paths representing post-scan/join steps, where the targetlist may well change from one step to the next. Therefore, invent a "struct PathTarget" representing the columns we expect a plan step to emit. It's convenient to include the output tuple width and tlist evaluation cost in this struct, and there will likely be additional fields in future. While Path nodes that actually do have custom outputs will need their own PathTargets, it will still be true that most Paths for a given relation will compute the same tlist. To reduce the overhead added by this patch, keep a "default PathTarget" in RelOptInfo, and allow Paths that compute that column set to just point to their parent RelOptInfo's reltarget. (In the patch as committed, actually every Path is like that, since we do not yet have any cases of custom PathTargets.) I took this opportunity to provide some more-honest costing of PlaceHolderVar evaluation. Up to now, the assumption that "scan/join reltargetlists have cost zero" was applied not only to Vars, where it's reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval cost of a PlaceHolderVar's expression to the first plan level where it can be computed, by including it in the PathTarget cost field and adding that to the cost estimates for Paths. This isn't perfect yet but it's much better than before, and there is a way forward to improve it more. This costing change affects the join order chosen for a couple of the regression tests, changing expected row ordering.
2016-01-20Support parallel joins, and make related improvements.Robert Haas
The core innovation of this patch is the introduction of the concept of a partial path; that is, a path which if executed in parallel will generate a subset of the output rows in each process. Gathering a partial path produces an ordinary (complete) path. This allows us to generate paths for parallel joins by joining a partial path for one side (which at the baserel level is currently always a Partial Seq Scan) to an ordinary path on the other side. This is subject to various restrictions at present, especially that this strategy seems unlikely to be sensible for merge joins, so only nested loops and hash joins paths are generated. This also allows an Append node to be pushed below a Gather node in the case of a partitioned table. Testing revealed that early versions of this patch made poor decisions in some cases, which turned out to be caused by the fact that the original cost model for Parallel Seq Scan wasn't very good. So this patch tries to make some modest improvements in that area. There is much more to be done in the area of generating good parallel plans in all cases, but this seems like a useful step forward. Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1