summaryrefslogtreecommitdiff
path: root/src/backend/optimizer
AgeCommit message (Collapse)Author
2007-03-13First phase of plan-invalidation project: create a plan cache managementTom Lane
module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.
2007-03-06Fix oversight in original coding of inline_function(): sinceTom Lane
check_sql_fn_retval allows binary-compatibility cases, the expression extracted from an inline-able SQL function might have a type that is only binary-compatible with the declared function result type. To avoid possibly changing the semantics of the expression, we should insert a RelabelType node in such cases. This has only been shown to have bad consequences in recent 8.1 and up releases, but I suspect there may be failure cases in the older branches too, so patch it all the way back. Per bug #3116 from Greg Mullane. Along the way, fix an omission in eval_const_expressions_mutator: it failed to copy the relabelformat field when processing a RelabelType. No known observable failures from this, but it definitely isn't intended behavior.
2007-02-27Get rid of the separate EState for subplans, and just let them share theTom Lane
parent query's EState. Now that there's a single flat rangetable for both the main plan and subplans, there's no need anymore for a separate EState, and removing it allows cleaning up some crufty code in nodeSubplan.c and nodeSubqueryscan.c. Should be a tad faster too, although any difference will probably be hard to measure. This is the last bit of subsidiary mop-up work from changing to a flat rangetable.
2007-02-25Put back copyObject() call I removed in a fit of brain fade. This oneTom Lane
is still needed despite cleanups in setrefs.c, because the point is to let the inserted Result node compute a different tlist than its input node does. Per example from Jeremy Drake.
2007-02-23Now that plans have flat rangetable lists, it's a lot easier to get EXPLAIN toTom Lane
drill down into subplan targetlists to print the referent expression for an OUTER or INNER var in an upper plan node. Hence, make it do that always, and banish the old hack of showing "?columnN?" when things got too complicated. Along the way, fix an EXPLAIN bug I introduced by suppressing subqueries from execution-time range tables: get_name_for_var_field() assumed it could look at rte->subquery to find out the real type of a RECORD var. That doesn't work anymore, but instead we can look at the input plan of the SubqueryScan plan node.
2007-02-22Change Agg and Group nodes so that Vars contained in their targetlistsTom Lane
and quals have varno OUTER, rather than zero, to indicate a reference to an output of their lefttree subplan. This is consistent with the way that every other upper-level node type does it, and allows some simplifications in setrefs.c and EXPLAIN.
2007-02-22Turn the rangetable used by the executor into a flat list, and avoid storingTom Lane
useless substructure for its RangeTblEntry nodes. (I chose to keep using the same struct node type and just zero out the link fields for unneeded info, rather than making a separate ExecRangeTblEntry type --- it seemed too fragile to have two different rangetable representations.) Along the way, put subplans into a list in the toplevel PlannedStmt node, and have SubPlan nodes refer to them by list index instead of direct pointers. Vadim wanted to do that years ago, but I never understood what he was on about until now. It makes things a *whole* lot more robust, because we can stop worrying about duplicate processing of subplans during expression tree traversals. That's been a constant source of bugs, and it's finally gone. There are some consequent simplifications yet to be made, like not using a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-20Remove the Query structure from the executor's API. This allows us to stopTom Lane
storing mostly-redundant Query trees in prepared statements, portals, etc. To replace Query, a new node type called PlannedStmt is inserted by the planner at the top of a completed plan tree; this carries just the fields of Query that are still needed at runtime. The statement lists kept in portals etc. now consist of intermixed PlannedStmt and bare utility-statement nodes --- no Query. This incidentally allows us to remove some fields from Query and Plan nodes that shouldn't have been there in the first place. Still to do: simplify the execution-time range table; at the moment the range table passed to the executor still contains Query trees for subqueries. initdb forced due to change of stored rules.
2007-02-19Get rid of some old and crufty global variables in the planner. WhenTom Lane
this code was last gone over, there wasn't really any alternative to globals because we didn't have the PlannerInfo struct being passed all through the planner code. Now that we do, we can restructure things to avoid non-reentrancy. I'm fooling with this because otherwise I'd have had to add another global variable for the planned compact range table list.
2007-02-19Put function expressions and values lists into FunctionScan and ValuesScanTom Lane
plan nodes, so that the executor does not need to get these items from the range table at runtime. This will avoid needing to include these fields in the compact range table I'm expecting to make the executor use.
2007-02-16Teach find_nonnullable_rels to handle OR cases: if every arm of an ORTom Lane
forces a particular relation nonnullable, then we can say that the OR does. This is worth a little extra trouble since it may allow reduction of outer joins to plain joins.
2007-02-16Adjust the definition of is_pushed_down so that it's always true for INNERTom Lane
JOIN quals, just like WHERE quals, even if they reference every one of the join's relations. Now that we can reorder outer and inner joins, it's possible for such a qual to end up being assigned to an outer join plan node, and we mustn't have it treated as a join qual rather than a filter qual for the node. (If it were, the join could produce null-extended rows that it shouldn't.) Per bug report from Pelle Johansson.
2007-02-16Fix another problem in 8.2 changes that allowed "one-time" qual conditions toTom Lane
be checked at plan levels below the top; namely, we have to allow for Result nodes inserted just above a nestloop inner indexscan. Should think about using the general Param mechanism to pass down outer-relation variables, but for the moment we need a back-patchable solution. Per report from Phil Frost.
2007-02-16Restructure code that is responsible for ensuring that clauseless joins areTom Lane
considered when it is necessary to do so because of a join-order restriction (that is, an outer-join or IN-subselect construct). The former coding was a bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario Weilguni's recent bug report. His specific problem was that an IN could be turned into a "clauseless" join due to constant-propagation removing the IN's joinclause, and if the IN's subselect involved more than one relation and there was more than one such IN linking to the same upper relation, then the only valid join orders involve "bushy" plans but we would fail to consider the specific paths needed to get there. (See the example case added to the join regression test.) On examining the code I wonder if there weren't some other problem cases too; in particular it seems that GEQO was defending against a different set of corner cases than the main planner was. There was also an efficiency problem, in that when we did realize we needed a clauseless join because of an IN, we'd consider clauseless joins against every other relation whether this was sensible or not. It seems a better design is to use the outer-join and in-clause lists as a backup heuristic, just as the rule of joining only where there are joinclauses is a heuristic: we'll join two relations if they have a usable joinclause *or* this might be necessary to satisfy an outer-join or IN-clause join order restriction. I refactored the code to have just one place considering this instead of three, and made sure that it covered all the cases that any of them had been considering. Backpatch as far as 8.1 (which has only the IN-clause form of the disease). By rights 8.0 and 7.4 should have the bug too, but they accidentally fail to fail, because the joininfo structure used in those releases preserves some memory of there having once been a joinclause between the inner and outer sides of an IN, and so it leads the code in the right direction anyway. I'll be conservative and not touch them.
2007-02-13Repair bug in 8.2's new logic for planning outer joins: we have to allow joinsTom Lane
that overlap an outer join's min_righthand but aren't fully contained in it, to support joining within the RHS after having performed an outer join that can commute with this one. Aside from the direct fix in make_join_rel(), fix has_join_restriction() and GEQO's desirable_join() to consider this possibility. Per report from Ian Harding.
2007-02-09Replace useless uses of := by = in makefiles.Peter Eisentraut
2007-02-06Fix a performance regression in 8.2: optimization of MIN/MAX into indexscansTom Lane
had stopped working for tables buried inside views or sub-selects. This is because I had gotten rid of the simplify_jointree() preprocessing step, and optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical FromExpr. Per gripe from Bill Howe.
2007-02-06Add support for cross-type hashing in hashed subplans (hashed IN/NOT IN casesTom Lane
that aren't turned into true joins). Since this is the last missing bit of infrastructure, go ahead and fill out the hash integer_ops and float_ops opfamilies with cross-type operators. The operator family project is now DONE ... er, except for documentation ...
2007-02-02Repair insufficiently careful type checking for SQL-language functions:Tom Lane
we should check that the function code returns the claimed result datatype every time we parse the function for execution. Formerly, for simple scalar result types we assumed the creation-time check was sufficient, but this fails if the function selects from a table that's been redefined since then, and even more obviously fails if check_function_bodies had been OFF. This is a significant security hole: not only can one trivially crash the backend, but with appropriate misuse of pass-by-reference datatypes it is possible to read out arbitrary locations in the server process's memory, which could allow retrieving database content the user should not be able to see. Our thanks to Jeff Trout for the initial report. Security: CVE-2007-0555
2007-02-01Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian
Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
2007-01-30Add support for cross-type hashing in hash index searches and hash joins.Tom Lane
Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
2007-01-28Repair oversight in creation of "append relations": we should set upTom Lane
rel->tuples as well as rel->rows, since some estimation functions expect both to be valid in every baserel. Per report from Dave Dutcher.
2007-01-22Put back planner's ability to cache the results of mergejoinscansel(),Tom Lane
which I had removed in the first cut of the EquivalenceClass rewrite to simplify that patch a little. But it's still important --- in a four-way join problem mergejoinscansel() was eating about 40% of the planning time according to gprof. Also, improve the EquivalenceClass code to re-use join RestrictInfos rather than generating fresh ones for each join considered. This saves some memory space but more importantly improves the effectiveness of caching planning info in RestrictInfos.
2007-01-22Add COST and ROWS options to CREATE/ALTER FUNCTION, plus underlying pg_procTom Lane
columns procost and prorows, to allow simple user adjustment of the estimated cost of a function call, as well as control of the estimated number of rows returned by a set-returning function. We might eventually wish to extend this to allow function-specific estimation routines, but there seems to be consensus that we should try a simple constant estimate first. In particular this provides a relatively simple way to control the order in which different WHERE clauses are applied in a plan node, which is a Good Thing in view of the fact that the recent EquivalenceClass planner rewrite made that much less predictable than before.
2007-01-21Refactor some lsyscache routines to eliminate duplicate code and saveTom Lane
a couple of syscache lookups in make_pathkey_from_sortinfo().
2007-01-20Simplify pg_am representation of ordering-capable access methods:Tom Lane
provide just a boolean 'amcanorder', instead of fields that specify the sort operator strategy numbers. We have decided to require ordering-capable AMs to use btree-compatible strategy numbers, so the old fields are overkill (and indeed misleading about what's allowed).
2007-01-20Refactor planner's pathkeys data structure to create a separate, explicitTom Lane
representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.
2007-01-20Remove remains of old depend target.Peter Eisentraut
2007-01-17Add a note pointing out that is_pseudo_constant_clause() doesn't checkTom Lane
for aggregates. This is OK for current uses but could burn somebody someday...
2007-01-10Change the planner-to-executor API so that the planner tells the executorTom Lane
which comparison operators to use for plan nodes involving tuple comparison (Agg, Group, Unique, SetOp). Formerly the executor looked up the default equality operator for the datatype, which was really pretty shaky, since it's possible that the data being fed to the node is sorted according to some nondefault operator class that could have an incompatible idea of equality. The planner knows what it has sorted by and therefore can provide the right equality operator to use. Also, this change moves a couple of catalog lookups out of the executor and into the planner, which should help startup time for pre-planned queries by some small amount. Modify the planner to remove some other cavalier assumptions about always being able to use the default operators. Also add "nulls first/last" info to the Plan node for a mergejoin --- neither the executor nor the planner can cope yet, but at least the API is in place.
2007-01-09Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LASTTom Lane
per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.
2007-01-08Tweak joinlist creation to avoid generating useless one-element subproblemsTom Lane
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance a list of 11 LEFT JOINs with limit 8 now produces something like ((1 2 3 4 5 6 7 8) 9 10 11 12) instead of (((1 2 3 4 5 6 7 8) (9)) 10 11 12) The latter structure is really only required for a FULL JOIN. Noted while studying an example from Shane Ambler.
2007-01-08Remove cost_hashjoin's very ancient hack to discourage (once, entirely forbid)Tom Lane
hash joins with the estimated-larger relation on the inside. There are several cases where doing that makes perfect sense, and in cases where it doesn't, the regular cost computation really ought to be able to figure that out. Make some marginal tweaks in said computation to try to get results approximating reality a bit better. Per an example from Shane Ambler. Also, fix an oversight in the original patch to add seq_page_cost: the costs of spilling a hash join to disk should be scaled by seq_page_cost.
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-12-28Enable btree_predicate_proof() to make proofs involving cross-data-typeTom Lane
predicate operators. The hard stuff turns out to be already done in the previous commit, we need merely open the floodgates...
2006-12-24Code review for XML patch. Instill a bit of sanity in the location ofTom Lane
the XmlExpr code in various lists, use a representation that has some hope of reverse-listing correctly (though it's still a de-escaping function shy of correctness), generally try to make it look more like Postgres coding conventions.
2006-12-23Restructure operator classes to allow improved handling of cross-data-typeTom Lane
cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
2006-12-21Initial SQL/XML support: xml data type and initial set of functions.Peter Eisentraut
2006-12-18Set pg_am.amstrategies to zero for index AMs that don't have fixedTom Lane
operator strategy numbers, ie, GiST and GIN. This is almost cosmetic enough to not need a catversion bump, but since the opr_sanity regression test has to change in sync with the catalog entry, I figured I'd better do one.
2006-12-15Fix some planner bugs exposed by reports from Arjen van der Meijden. TheseTom Lane
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr (IN-clauses) or amortization of indexscan costs across repeated indexscans on the inside of a nestloop. In particular: Fix some logic errors in the estimation for multiple scans induced by a ScalarArrayOpExpr indexqual. Include a small cost component in bitmap index scans to reflect the costs of manipulating the bitmap itself; this is mainly to prevent a bitmap scan from appearing to have the same cost as a plain indexscan for fetching a single tuple. Also add a per-index-scan-startup CPU cost component; while prior releases were clearly too pessimistic about the cost of repeated indexscans, the original 8.2 coding allowed the cost of an indexscan to effectively go to zero if repeated often enough, which is overly optimistic. Pay some attention to index correlation when estimating costs for a nestloop inner indexscan: this is significant when the plan fetches multiple heap tuples per iteration, since high correlation means those tuples are probably on the same or adjacent heap pages.
2006-12-12Fix planner to do the right thing when a degenerate outer join (one whoseTom Lane
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be created. The normal heuristic to avoid joins with no joinclause has to be overridden in that case. Problem is new in 8.2; before that we forced the outer join order anyway. Per example from Teodor.
2006-12-10Add a paramtypmod field to Param nodes. This is dead weight for ParamsTom Lane
representing externally-supplied values, since the APIs that carry such values only specify type not typmod. However, for PARAM_SUBLINK Params it is handy to carry the typmod of the sublink's output column. This is a much cleaner solution for the recently reported 'could not find pathkey item to sort' and 'failed to find unique expression in subplan tlist' bugs than my original 8.2-compatible patch. Besides, someday we might want to support typmods for external parameters ...
2006-12-07Repair incorrect placement of WHERE clauses when there are multiple,Tom Lane
rearrangeable outer joins and the WHERE clause is non-strict and mentions only nullable-side relations. New bug in 8.2, caused by new logic to allow rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff Davis for producing a usable test case.
2006-12-06Fix planning of SubLinks to ensure that Vars generated from transformation ofTom Lane
a sublink's test expression have the correct vartypmod, rather than defaulting to -1. There's at least one place where this is important because we're expecting these Vars to be exactly equal() to those appearing in the subplan itself. This is a pretty klugy solution --- it would likely be cleaner to change Param nodes to include a typmod field --- but we can't do that in the already-released 8.2 branch. Per bug report from Hubert Fongarnand.
2006-11-11Suppress a few 'uninitialized variable' warnings that gcc emits only atTom Lane
-O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.
2006-11-10Fix set_joinrel_size_estimates() to estimate outer-join sizes moreTom Lane
accurately: we have to distinguish the effects of the join's own ON clauses from the effects of pushed-down clauses. Failing to do so was a quick hack long ago, but it's time to be smarter. Per example from Thomas H.
2006-10-25expression_tree_walker failed to let walker function see the immediate childTom Lane
node of a SubLink or SubPlan testexpr field. Bug resulted from replacing the old lefthand/exprs list fields with a simple expression field, and not remembering that expression_tree_walker is coded to save a few cycles by recursing directly to self on list fields (on the assumption the walker isn't interested in List nodes per se). On non-list fields it must of course call the walker. Possibly that hack isn't worth the risk of more such bugs, but I'll leave it be for now. Per bug report from James Robinson.
2006-10-24Fix check for whether a clauseless join has to be forced in the presence ofTom Lane
outer joins. Originally it was only looking for overlap of the righthand side of a left join, but we have to do it on the lefthand side too. Per example from Jean-Pierre Pelletier.
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-09-28Fix IS NULL and IS NOT NULL tests on row-valued expressions to conform toTom Lane
the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT NULL is true if all the row's fields are not null. The former coding got this right for a limited number of cases with IS NULL (ie, those where it could disassemble a ROW constructor at parse time), but was entirely wrong for IS NOT NULL. Per report from Teodor. I desisted from changing the behavior for arrays, since on closer inspection it's not clear that there's any support for that in the SQL spec. This probably needs more consideration.