summaryrefslogtreecommitdiff
path: root/src/backend/optimizer/plan/createplan.c
AgeCommit message (Collapse)Author
2008-08-07Improve INTERSECT/EXCEPT hashing by realizing that we don't need to make anyTom Lane
hashtable entries for tuples that are found only in the second input: they can never contribute to the output. Furthermore, this implies that the planner should endeavor to put first the smaller (in number of groups) input relation for an INTERSECT. Implement that, and upgrade prepunion's estimation of the number of rows returned by setops so that there's some amount of sanity in the estimate of which one is smaller.
2008-08-07Support hashing for duplicate-elimination in INTERSECT and EXCEPT queries.Tom Lane
This completes my project of improving usage of hashing for duplicate elimination (aggregate functions with DISTINCT remain undone, but that's for some other day). As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER BY are no longer certain to produce sorted output.
2008-08-02Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT itemsTom Lane
as per my recent proposal: 1. Fold SortClause and GroupClause into a single node type SortGroupClause. We were already relying on them to be struct-equivalent, so using two node tags wasn't accomplishing much except to get in the way of comparing items with equal(). 2. Add an "eqop" field to SortGroupClause to carry the associated equality operator. This is cheap for the parser to get at the same time it's looking up the sort operator, and storing it eliminates the need for repeated not-so-cheap lookups during planning. In future this will also let us represent GROUP/DISTINCT operations on datatypes that have hash opclasses but no btree opclasses (ie, they have equality but no natural sort order). The previous representation simply didn't work for that, since its only indicator of comparison semantics was a sort operator. 3. Add a hasDistinctOn boolean to struct Query to explicitly record whether the distinctClause came from DISTINCT or DISTINCT ON. This allows removing some complicated and not 100% bulletproof code that attempted to figure that out from the distinctClause alone. This patch doesn't in itself create any new capability, but it's necessary infrastructure for future attempts to use hash-based grouping for DISTINCT and UNION/INTERSECT/EXCEPT.
2008-06-27Improve planner's estimation of the size of an append relation: rather thanTom Lane
taking the maximum of any child rel's width, we should weight the widths proportionally to the number of rows expected from each child. In hindsight this is obviously correct because row width is really a proxy for the total physical size of the relation. Per discussion with Scott Carey (bug #4264).
2008-04-17Fix a couple of oversights associated with the "physical tlist" optimization:Tom Lane
we had several code paths where a physical tlist could be used for the input to a Sort node, which is a dumb idea because any unneeded table columns will increase the volume of data the sort has to push around. (Unfortunately the easy-looking fix of calling disuse_physical_tlist during make_sort_xxx doesn't work because in most cases we're already committed to the current input tlist --- it's been marked with sort column numbers, or we've built grouping column numbers using it, etc. The tlist has to be selected properly at the calling level before we start constructing sort-col information. This is easy enough to do, we were just failing to take the point into consideration.) Back-patch to 8.3. I believe the problem probably exists clear back to 7.4 when the physical tlist optimization was added, but I'm afraid to back-patch further than 8.3 without a great deal more study than I want to put into it. The code in this area has drifted a lot over time. The real-world importance of these code paths is uncertain anyway --- I think in many cases we'd probably prefer hash-based methods.
2008-04-13Since createplan.c no longer cares whether index operators are lossy, it hasTom Lane
no particular need to do get_op_opfamily_properties() while building an indexscan plan. Postpone that lookup until executor start. This simplifies createplan.c a lot more than it complicates nodeIndexscan.c, and makes things more uniform since we already had to do it that way for RowCompare expressions. Should be a bit faster too, at least for plans that aren't re-used many times, since we avoid palloc'ing and perhaps copying the intermediate list data structure.
2008-04-13Phase 2 of project to make index operator lossiness be determined at runtimeTom Lane
instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-11-15Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian
avoid this problem in the future.)
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-11-08Fix EquivalenceClass code to handle volatile sort expressions in a moreTom Lane
predictable manner; in particular that if you say ORDER BY output-column-ref, it will in fact sort by that specific column even if there are multiple syntactic matches. An example is SELECT random() AS a, random() AS b FROM ... ORDER BY b, a; While the use-case for this might be a bit debatable, it worked as expected in earlier releases, so we should preserve the behavior for 8.3. Per my recent proposal. While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping in both directions; it needs this for the same reasons make_sort_from_pathkeys does.
2007-11-08Last week's patch for make_sort_from_pathkeys wasn't good enough: it hasTom Lane
to be able to discard top-level RelabelType nodes on *both* sides of the equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo might either add or remove a RelabelType. Also fix the latter to do the removal case cleanly. Per example from Peter.
2007-11-02Ensure that EquivalenceClasses generated from ORDER BY keys contain properTom Lane
RelabelType nodes when the sort key is binary-compatible with the sort operator rather than having exactly its input type. We did this correctly for index columns but not sort keys, leading to failure to notice that a varchar index matches an ORDER BY request. This requires a bit more work in make_sort_from_pathkeys, but not anyplace else that I can find. Per bug report and subsequent discussion.
2007-05-21Teach tuplestore.c to throw away data before the "mark" point when the callerTom Lane
is using mark/restore but not rewind or backward-scan capability. Insert a materialize plan node between a mergejoin and its inner child if the inner child is a sort that is expected to spill to disk. The materialize shields the sort from the need to do mark/restore and thereby allows it to perform its final merge pass on-the-fly; while the materialize itself is normally cheap since it won't spill to disk unless the number of tuples with equal key values exceeds work_mem. Greg Stark, with some kibitzing from Tom Lane.
2007-05-04Teach tuplesort.c about "top N" sorting, in which only the first N tuplesTom Lane
need be returned. We keep a heap of the current best N tuples and sift-up new tuples into it as we scan the input. For M input tuples this means only about M*log(N) comparisons instead of M*log(M), not to mention a lot less workspace when N is small --- avoiding spill-to-disk for large M is actually the most attractive thing about it. Patch includes planner and executor support for invoking this facility in ORDER BY ... LIMIT queries. Greg Stark, with some editorialization by moi.
2007-04-21Some further performance tweaks for planning large inheritance trees thatTom Lane
are mostly excluded by constraints: do the CE test a bit earlier to save some adjust_appendrel_attrs() work on excluded children, and arrange to use array indexing rather than rt_fetch() to fetch RTEs in the main body of the planner. The latter is something I'd wanted to do for awhile anyway, but seeing list_nth_cell() as 35% of the runtime gets one's attention.
2007-04-06Make 'col IS NULL' clauses be indexable conditions.Tom Lane
Teodor Sigaev, with some kibitzing from Tom Lane.
2007-02-25Put back copyObject() call I removed in a fit of brain fade. This oneTom Lane
is still needed despite cleanups in setrefs.c, because the point is to let the inserted Result node compute a different tlist than its input node does. Per example from Jeremy Drake.
2007-02-22Turn the rangetable used by the executor into a flat list, and avoid storingTom Lane
useless substructure for its RangeTblEntry nodes. (I chose to keep using the same struct node type and just zero out the link fields for unneeded info, rather than making a separate ExecRangeTblEntry type --- it seemed too fragile to have two different rangetable representations.) Along the way, put subplans into a list in the toplevel PlannedStmt node, and have SubPlan nodes refer to them by list index instead of direct pointers. Vadim wanted to do that years ago, but I never understood what he was on about until now. It makes things a *whole* lot more robust, because we can stop worrying about duplicate processing of subplans during expression tree traversals. That's been a constant source of bugs, and it's finally gone. There are some consequent simplifications yet to be made, like not using a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-19Put function expressions and values lists into FunctionScan and ValuesScanTom Lane
plan nodes, so that the executor does not need to get these items from the range table at runtime. This will avoid needing to include these fields in the compact range table I'm expecting to make the executor use.
2007-01-30Add support for cross-type hashing in hash index searches and hash joins.Tom Lane
Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
2007-01-22Add COST and ROWS options to CREATE/ALTER FUNCTION, plus underlying pg_procTom Lane
columns procost and prorows, to allow simple user adjustment of the estimated cost of a function call, as well as control of the estimated number of rows returned by a set-returning function. We might eventually wish to extend this to allow function-specific estimation routines, but there seems to be consensus that we should try a simple constant estimate first. In particular this provides a relatively simple way to control the order in which different WHERE clauses are applied in a plan node, which is a Good Thing in view of the fact that the recent EquivalenceClass planner rewrite made that much less predictable than before.
2007-01-20Refactor planner's pathkeys data structure to create a separate, explicitTom Lane
representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.
2007-01-10Change the planner-to-executor API so that the planner tells the executorTom Lane
which comparison operators to use for plan nodes involving tuple comparison (Agg, Group, Unique, SetOp). Formerly the executor looked up the default equality operator for the datatype, which was really pretty shaky, since it's possible that the data being fed to the node is sorted according to some nondefault operator class that could have an incompatible idea of equality. The planner knows what it has sorted by and therefore can provide the right equality operator to use. Also, this change moves a couple of catalog lookups out of the executor and into the planner, which should help startup time for pre-planned queries by some small amount. Modify the planner to remove some other cavalier assumptions about always being able to use the default operators. Also add "nulls first/last" info to the Plan node for a mergejoin --- neither the executor nor the planner can cope yet, but at least the API is in place.
2007-01-09Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LASTTom Lane
per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-12-23Restructure operator classes to allow improved handling of cross-data-typeTom Lane
cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-08-02Add support for multi-row VALUES clauses as part of INSERT statementsJoe Conway
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed by the spec. (e.g. similar to a FROM clause subselect). initdb required. Joe Conway and Tom Lane.
2006-07-26Change LIMIT/OFFSET to use int8Bruce Momjian
Dhanaraj M
2006-07-14Remove 576 references of include files that were not needed.Bruce Momjian
2006-07-11Sort reference of include files, "A" - "F".Bruce Momjian
2006-07-01Revise the planner's handling of "pseudoconstant" WHERE clauses, that isTom Lane
clauses containing no variables and no volatile functions. Such a clause can be used as a one-time qual in a gating Result plan node, to suppress plan execution entirely when it is false. Even when the clause is true, putting it in a gating node wins by avoiding repeated evaluation of the clause. In previous PG releases, query_planner() would do this for pseudoconstant clauses appearing at the top level of the jointree, but there was no ability to generate a gating Result deeper in the plan tree. To fix it, get rid of the special case in query_planner(), and instead process pseudoconstant clauses through the normal RestrictInfo qual distribution mechanism. When a pseudoconstant clause is found attached to a path node in create_plan(), pull it out and generate a gating Result at that point. This requires special-casing pseudoconstants in selectivity estimation and cost_qual_eval, but on the whole it's pretty clean. It probably even makes the planner a bit faster than before for the normal case of no pseudoconstants, since removing pull_constant_clauses saves one useless traversal of the qual tree. Per gripe from Phil Frost.
2006-05-18When a bitmap indexscan is using a partial index, it is necessary to includeTom Lane
the partial index predicate in the scan's "recheck condition". Otherwise, if the scan becomes lossy for lack of bitmap memory, we would fail to enforce that returned rows satisfy the predicate. Noted while studying bug #2441 from Arjen van der Meijden.
2006-04-30Improve the representation of FOR UPDATE/FOR SHARE so that we canTom Lane
support both FOR UPDATE and FOR SHARE in one command, as well as both NOWAIT and normal WAIT behavior. The more general code is actually simpler and cleaner.
2006-04-25The 8.1 planner removes WHERE quals from the plan when the quals areTom Lane
implied by the predicate of a partial index being used to scan a table. However, this optimization is unsafe in an UPDATE, DELETE, or SELECT FOR UPDATE query, because the quals need to be rechecked by EvalPlanQual if there's an update conflict. Per example from Jean-Samuel Reynaud.
2006-03-05Update copyright for 2006. Update scripts.Bruce Momjian
2006-01-29When building a bitmap scan, must copy the bitmapqualorig expression treeTom Lane
to avoid sharing substructure with the lower-level indexquals. This is currently only an issue if there are SubPlans in the indexquals, which is uncommon but not impossible --- see bug #2218 reported by Nicholas Vinen. We use the same kluge for indexqual vs indexqualorig in the index scans themselves ... would be nice to clean this up someday.
2006-01-25Allow row comparisons to be used as indexscan qualifications.Tom Lane
This completes the project to upgrade our handling of row comparisons.
2005-11-26Teach tid-scan code to make use of "ctid = ANY (array)" clauses, so thatTom Lane
"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr. Make some minor efficiency improvements while at it, such as ensuring that multiple TIDs are fetched in physical heap order. And fix EXPLAIN so that it shows what's really going on for a TID scan.
2005-11-25Teach planner and executor to handle ScalarArrayOpExpr as an indexableTom Lane
qualification when the underlying operator is indexable and useOr is true. That is, indexkey op ANY (ARRAY[...]) is effectively translated into an OR combination of one indexscan for each array element. This only works for bitmap index scans, of course, since regular indexscans no longer support OR'ing of scans. There are still some loose ends to clean up before changing 'x IN (list)' to translate as a ScalarArrayOpExpr; for instance predtest.c ought to be taught about it. But this gets the basic functionality in place.
2005-11-22Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian
comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
2005-10-19Fix oversight in recent changes to enable the 'physical tlist'Tom Lane
optimization for subquery and function scan nodes: we can't just do it unconditionally, we still have to check whether there is any need for a whole-row Var. I had been thinking that these node types couldn't have any system columns, which is true, but that loop is also checking for attno zero, ie, whole-row Var. Fix comment to not be so misleading. Per test case from Richard Huxton.
2005-10-15Standard pgindent run for 8.1.Bruce Momjian
2005-10-13Don't try to remove duplicate OR-subclauses in create_bitmap_subplan andTom Lane
make_restrictinfo_from_bitmapqual. The likelihood of finding duplicates seems much less than in the AND-subclause case, and the cost much higher, because OR lists with hundreds or even thousands of subclauses are not uncommon. Per discussion with Ilia Kantor and andrew@supernews.
2005-10-06Fix oversight in indexscan plan creation. I recently added code to useTom Lane
predicate_implied_by() to detect redundant filter conditions, but forgot that predicate_implied_by() assumes its first argument contains only immutable functions. Add a check to guarantee that. Also, test to see if filter conditions can be discarded because they are redundant with the predicate of a partial index.
2005-09-24Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.Tom Lane
2005-08-18Fix up LIMIT/OFFSET planning so that we cope with non-constant LIMITTom Lane
or OFFSET clauses by using estimate_expression_value(). The main advantage of this is that if the expression is a Param and we have a value for the Param, we'll use that value rather than defaulting. Also, fix some thinkos in the logic for combining LIMIT/OFFSET with an externally supplied tuple fraction (this covers cases like EXISTS(...LIMIT...)). And make sure the results of all this are shown by EXPLAIN. Per a gripe from Merlin Moncure.
2005-07-28Fix a bunch of bad interactions between partial indexes and the newTom Lane
planning logic for bitmap indexscans. Partial indexes create corner cases in which a scan might be done with no explicit index qual conditions, and the code wasn't handling those cases nicely. Also be a little tenser about eliminating redundant clauses in the generated plan. Per report from Dmitry Karasik.
2005-07-23Simple constraint exclusion. For now, only child tables of inheritanceTom Lane
scans are candidates for exclusion; this should be fixed eventually. Simon Riggs, with some help from Tom Lane.