summaryrefslogtreecommitdiff
path: root/src/backend/optimizer
AgeCommit message (Collapse)Author
2007-08-31Apply a band-aid fix for the problem that 8.2 and up completely misestimateTom Lane
the number of rows likely to be produced by a query such as SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL; What this is doing is selecting for t1 rows with no match in t2, and thus it may produce a significant number of rows even if the t2.key table column contains no nulls at all. 8.2 thinks the table column's null fraction is relevant and thus may estimate no rows out, which results in terrible plans if there are more joins above this one. A proper fix for this will involve passing much more information about the context of a clause to the selectivity estimator functions than we ever have. There's no time left to write such a patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead, put in an ad-hoc test to defeat the normal table-stats-based estimation when an IS NULL test is evaluated at an outer join, and just use a constant estimate instead --- I went with 0.5 for lack of a better idea. This won't catch every case but it will catch the typical ways of writing such queries, and it seems unlikely to make things worse for other queries.
2007-08-31Rewrite make_outerjoininfo's construction of min_lefthand and min_righthandTom Lane
sets for outer joins, in the light of bug #3588 and additional thought and experimentation. The original methodology was fatally flawed for nests of more than two outer joins: it got the relationships between adjacent joins right, but didn't always come to the right conclusions about whether a join could be interchanged with one two or more levels below it. This was largely caused by a mistaken idea that we should use the min_lefthand + min_righthand sets of a sub-join as the minimum left or right input set of an upper join when we conclude that the sub-join can't commute with the upper one. If there's a still-lower join that the sub-join *can* commute with, this method led us to think that that one could commute with the topmost join; which it can't. Another problem (not directly connected to bug #3588) was that make_outerjoininfo's processing-order-dependent method for enforcing outer join identity #3 didn't work right: if we decided that join A could safely commute with lower join B, we dropped all information about sub-joins under B that join A could perhaps not safely commute with, because we removed B's entire min_righthand from A's. To fix, make an explicit computation of all inner join combinations that occur below an outer join, and add to that the full syntactic relsets of any lower outer joins that we determine it can't commute with. This method gives much more direct enforcement of the outer join rearrangement identities, and it turns out not to cost a lot of additional bookkeeping. Thanks to Richard Harris for the bug report and test case.
2007-07-31Fix a bug in the original implementation of redundant-join-clause removal:Tom Lane
clauses in which one side or the other references both sides of the join cannot be removed as redundant, because that expression won't have been constrained below the join. Per report from Sergey Burladyan.
2007-07-24Fix predicate-proving logic to cope with binary-compatibility cases whenTom Lane
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by a strict function. Per example from Dawid Kuroczko. Backpatch to 8.2 since this is arguably a performance bug.
2007-07-18Fix an old thinko in SS_make_initplan_from_plan, which is used when optimizingTom Lane
a MIN or MAX aggregate call into an indexscan: the initplan is being made at the current query nesting level and so we shouldn't increment query_level. Though usually harmless, this mistake could lead to bogus "plan should not reference subplan's variable" failures on complex queries. Per bug report from David Sanchez i Gregori.
2007-07-12Fix mistaken Assert in adjust_appendrel_attr_needed, per Greg Stark.Tom Lane
2007-05-22Repair planner bug introduced in 8.2 by ability to rearrange outer joins:Tom Lane
in cases where a sub-SELECT inserts a WHERE clause between two outer joins, that clause may prevent us from re-ordering the two outer joins. The code was considering only the joins' own ON-conditions in determining reordering safety, which is not good enough. Add a "delay_upper_joins" flag to OuterJoinInfo to flag that we have detected such a clause and higher-level outer joins shouldn't be permitted to commute with this one. (This might seem overly coarse, but given the current rules for OJ reordering, it's sufficient AFAICT.) The failure case is actually pretty narrow: it needs a WHERE clause within the RHS of a left join that checks the RHS of a lower left join, but is not strict for that RHS (else we'd have simplified the lower join to a plain join). Even then no failure will be manifest unless the planner chooses to rearrange the join order. Per bug report from Adam Terrey.
2007-05-22Fix best_inner_indexscan to return both the cheapest-total-cost andTom Lane
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider both of these (when different) as the inside of a nestloop join. The original design was based on the assumption that indexscan paths always have negligible startup cost, and so total cost is the only important figure of merit; an assumption that's obviously broken by bitmap indexscans. This oversight could lead to choosing poor plans in cases where fast-start behavior is more important than total cost, such as LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from Chuck D.
2007-05-12Improve predicate_refuted_by_simple_clause() to handle IS NULL and IS NOT NULLTom Lane
more completely. The motivation for having it understand IS NULL at all was to allow use of "foo IS NULL" as one of the subsets of a partitioning on "foo", but as reported by Aleksander Kmetec, it wasn't really getting the job done. Backpatch to 8.2 since this is arguably a performance bug.
2007-05-01Fix a thinko in my patch of a couple months ago for bug #3116: it did theTom Lane
wrong thing when inlining polymorphic SQL functions, because it was using the function's declared return type where it should have used the actual result type of the current call. In 8.1 and 8.2 this causes obvious failures even if you don't have assertions turned on; in 8.0 and 7.4 it would only be a problem if the inlined expression were used as an input to a function that did run-time type determination on its inputs. Add a regression test, since this is evidently an under-tested area.
2007-04-17Rewrite choose_bitmap_and() to make it more robust in the presence ofTom Lane
competing alternatives for indexes to use in a bitmap scan. The former coding took estimated selectivity as an overriding factor, causing it to sometimes choose indexes that were much slower to scan than ones with a slightly worse selectivity. It was also too narrow-minded about which combinations of indexes to consider ANDing. The rewrite makes it pay more attention to index scan cost than selectivity; this seems sane since it's impossible to have very bad selectivity with low cost, whereas the reverse isn't true. Also, we now consider each index alone, as well as adding each index to an AND-group led by each prior index, for a total of about O(N^2) rather than O(N) combinations considered. This makes the results much less dependent on the exact order in which the indexes are considered. It's still a lot cheaper than an O(2^N) exhaustive search. A prefilter step eliminates all but the cheapest of those indexes using the same set of WHERE conditions, to keep the effective value of N down in scenarios where the DBA has created lots of partially-redundant indexes.
2007-03-06Fix oversight in original coding of inline_function(): sinceTom Lane
check_sql_fn_retval allows binary-compatibility cases, the expression extracted from an inline-able SQL function might have a type that is only binary-compatible with the declared function result type. To avoid possibly changing the semantics of the expression, we should insert a RelabelType node in such cases. This has only been shown to have bad consequences in recent 8.1 and up releases, but I suspect there may be failure cases in the older branches too, so patch it all the way back. Per bug #3116 from Greg Mullane. Along the way, fix an omission in eval_const_expressions_mutator: it failed to copy the relabelformat field when processing a RelabelType. No known observable failures from this, but it definitely isn't intended behavior.
2007-02-16Adjust the definition of is_pushed_down so that it's always true for INNERTom Lane
JOIN quals, just like WHERE quals, even if they reference every one of the join's relations. Now that we can reorder outer and inner joins, it's possible for such a qual to end up being assigned to an outer join plan node, and we mustn't have it treated as a join qual rather than a filter qual for the node. (If it were, the join could produce null-extended rows that it shouldn't.) Per bug report from Pelle Johansson.
2007-02-16Fix another problem in 8.2 changes that allowed "one-time" qual conditions toTom Lane
be checked at plan levels below the top; namely, we have to allow for Result nodes inserted just above a nestloop inner indexscan. Should think about using the general Param mechanism to pass down outer-relation variables, but for the moment we need a back-patchable solution. Per report from Phil Frost.
2007-02-16Restructure code that is responsible for ensuring that clauseless joins areTom Lane
considered when it is necessary to do so because of a join-order restriction (that is, an outer-join or IN-subselect construct). The former coding was a bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario Weilguni's recent bug report. His specific problem was that an IN could be turned into a "clauseless" join due to constant-propagation removing the IN's joinclause, and if the IN's subselect involved more than one relation and there was more than one such IN linking to the same upper relation, then the only valid join orders involve "bushy" plans but we would fail to consider the specific paths needed to get there. (See the example case added to the join regression test.) On examining the code I wonder if there weren't some other problem cases too; in particular it seems that GEQO was defending against a different set of corner cases than the main planner was. There was also an efficiency problem, in that when we did realize we needed a clauseless join because of an IN, we'd consider clauseless joins against every other relation whether this was sensible or not. It seems a better design is to use the outer-join and in-clause lists as a backup heuristic, just as the rule of joining only where there are joinclauses is a heuristic: we'll join two relations if they have a usable joinclause *or* this might be necessary to satisfy an outer-join or IN-clause join order restriction. I refactored the code to have just one place considering this instead of three, and made sure that it covered all the cases that any of them had been considering. Backpatch as far as 8.1 (which has only the IN-clause form of the disease). By rights 8.0 and 7.4 should have the bug too, but they accidentally fail to fail, because the joininfo structure used in those releases preserves some memory of there having once been a joinclause between the inner and outer sides of an IN, and so it leads the code in the right direction anyway. I'll be conservative and not touch them.
2007-02-13Repair bug in 8.2's new logic for planning outer joins: we have to allow joinsTom Lane
that overlap an outer join's min_righthand but aren't fully contained in it, to support joining within the RHS after having performed an outer join that can commute with this one. Aside from the direct fix in make_join_rel(), fix has_join_restriction() and GEQO's desirable_join() to consider this possibility. Per report from Ian Harding.
2007-02-06Fix a performance regression in 8.2: optimization of MIN/MAX into indexscansTom Lane
had stopped working for tables buried inside views or sub-selects. This is because I had gotten rid of the simplify_jointree() preprocessing step, and optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical FromExpr. Per gripe from Bill Howe.
2007-02-02Repair insufficiently careful type checking for SQL-language functions:Tom Lane
we should check that the function code returns the claimed result datatype every time we parse the function for execution. Formerly, for simple scalar result types we assumed the creation-time check was sufficient, but this fails if the function selects from a table that's been redefined since then, and even more obviously fails if check_function_bodies had been OFF. This is a significant security hole: not only can one trivially crash the backend, but with appropriate misuse of pass-by-reference datatypes it is possible to read out arbitrary locations in the server process's memory, which could allow retrieving database content the user should not be able to see. Our thanks to Jeff Trout for the initial report. Security: CVE-2007-0555
2007-01-28Repair oversight in creation of "append relations": we should set upTom Lane
rel->tuples as well as rel->rows, since some estimation functions expect both to be valid in every baserel. Per report from Dave Dutcher.
2007-01-08Tweak joinlist creation to avoid generating useless one-element subproblemsTom Lane
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance a list of 11 LEFT JOINs with limit 8 now produces something like ((1 2 3 4 5 6 7 8) 9 10 11 12) instead of (((1 2 3 4 5 6 7 8) (9)) 10 11 12) The latter structure is really only required for a FULL JOIN. Noted while studying an example from Shane Ambler.
2007-01-08Remove cost_hashjoin's very ancient hack to discourage (once, entirely forbid)Tom Lane
hash joins with the estimated-larger relation on the inside. There are several cases where doing that makes perfect sense, and in cases where it doesn't, the regular cost computation really ought to be able to figure that out. Make some marginal tweaks in said computation to try to get results approximating reality a bit better. Per an example from Shane Ambler. Also, fix an oversight in the original patch to add seq_page_cost: the costs of spilling a hash join to disk should be scaled by seq_page_cost.
2006-12-15Fix some planner bugs exposed by reports from Arjen van der Meijden. TheseTom Lane
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr (IN-clauses) or amortization of indexscan costs across repeated indexscans on the inside of a nestloop. In particular: Fix some logic errors in the estimation for multiple scans induced by a ScalarArrayOpExpr indexqual. Include a small cost component in bitmap index scans to reflect the costs of manipulating the bitmap itself; this is mainly to prevent a bitmap scan from appearing to have the same cost as a plain indexscan for fetching a single tuple. Also add a per-index-scan-startup CPU cost component; while prior releases were clearly too pessimistic about the cost of repeated indexscans, the original 8.2 coding allowed the cost of an indexscan to effectively go to zero if repeated often enough, which is overly optimistic. Pay some attention to index correlation when estimating costs for a nestloop inner indexscan: this is significant when the plan fetches multiple heap tuples per iteration, since high correlation means those tuples are probably on the same or adjacent heap pages.
2006-12-12Fix planner to do the right thing when a degenerate outer join (one whoseTom Lane
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be created. The normal heuristic to avoid joins with no joinclause has to be overridden in that case. Problem is new in 8.2; before that we forced the outer join order anyway. Per example from Teodor.
2006-12-07Repair incorrect placement of WHERE clauses when there are multiple,Tom Lane
rearrangeable outer joins and the WHERE clause is non-strict and mentions only nullable-side relations. New bug in 8.2, caused by new logic to allow rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff Davis for producing a usable test case.
2006-12-06Fix planning of SubLinks to ensure that Vars generated from transformation ofTom Lane
a sublink's test expression have the correct vartypmod, rather than defaulting to -1. There's at least one place where this is important because we're expecting these Vars to be exactly equal() to those appearing in the subplan itself. This is a pretty klugy solution --- it would likely be cleaner to change Param nodes to include a typmod field --- but we can't do that in the already-released 8.2 branch. Per bug report from Hubert Fongarnand.
2006-11-11Suppress a few 'uninitialized variable' warnings that gcc emits only atTom Lane
-O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.
2006-11-10Fix set_joinrel_size_estimates() to estimate outer-join sizes moreTom Lane
accurately: we have to distinguish the effects of the join's own ON clauses from the effects of pushed-down clauses. Failing to do so was a quick hack long ago, but it's time to be smarter. Per example from Thomas H.
2006-10-25expression_tree_walker failed to let walker function see the immediate childTom Lane
node of a SubLink or SubPlan testexpr field. Bug resulted from replacing the old lefthand/exprs list fields with a simple expression field, and not remembering that expression_tree_walker is coded to save a few cycles by recursing directly to self on list fields (on the assumption the walker isn't interested in List nodes per se). On non-list fields it must of course call the walker. Possibly that hack isn't worth the risk of more such bugs, but I'll leave it be for now. Per bug report from James Robinson.
2006-10-24Fix check for whether a clauseless join has to be forced in the presence ofTom Lane
outer joins. Originally it was only looking for overlap of the righthand side of a left join, but we have to do it on the lefthand side too. Per example from Jean-Pierre Pelletier.
2006-10-04pgindent run for 8.2.Bruce Momjian
2006-09-28Fix IS NULL and IS NOT NULL tests on row-valued expressions to conform toTom Lane
the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT NULL is true if all the row's fields are not null. The former coding got this right for a limited number of cases with IS NULL (ie, those where it could disassemble a ROW constructor at parse time), but was entirely wrong for IS NOT NULL. Per report from Teodor. I desisted from changing the behavior for arrays, since on closer inspection it's not clear that there's any support for that in the SQL spec. This probably needs more consideration.
2006-09-19Improve usage of effective_cache_size parameter by assuming that all theTom Lane
tables in the query compete for cache space, not just the one we are currently costing an indexscan for. This seems more realistic, and it definitely will help in examples recently exhibited by Stefan Kaltenbrunner. To get the total size of all the tables involved, we must tweak the handling of 'append relations' a bit --- formerly we looked up information about the child tables on-the-fly during set_append_rel_pathlist, but it needs to be done before we start doing any cost estimation, so push it into the add_base_rels_to_query scan.
2006-09-08Put back plan-time check for trying to apply SELECT FOR UPDATE/SHARETom Lane
to a relation on the nullable side of an outer join. I had removed this during the outer join planning rewrite a few months ago ... I think I intended to put it somewhere else, but forgot ...
2006-09-06Change processing of extended-Query mode so that an unnamed statementTom Lane
that has parameters is always planned afresh for each Bind command, treating the parameter values as constants in the planner. This removes the performance penalty formerly often paid for using out-of-line parameters --- with this definition, the planner can do constant folding, LIKE optimization, etc. After a suggestion by Andrew@supernews.
2006-08-28Tweak trivial_subqueryscan() to consider a SubqueryScan's targetlistTom Lane
trivial if it contains either Vars referencing the corresponding subplan columns, or Consts equaling the corresponding subplan columns. This lets the planner eliminate the SubqueryScan in some cases generated by generate_setop_tlist().
2006-08-25Add the ability to create indexes 'concurrently', that is, withoutTom Lane
blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.
2006-08-19Suppress subquery pullup/pushdown when a subquery contains volatileTom Lane
functions in its targetlist, to avoid introducing multiple evaluations of volatile functions that textually appear only once. This is a slightly tighter version of Jaime Casanova's recent patch.
2006-08-17Fix an oversight in mergejoin planning: the planner would reject aTom Lane
mergejoin possibility where the inner rel was less well sorted than the outer (ie, it matches some but not all of the merge clauses that can work with the outer), if the inner path in question is also the overall cheapest path for its rel. This is an old bug, but I'm not sure it's worth back-patching, because it's such a corner case. Noted while investigating a test case from Peter Hardman.
2006-08-17Teach convert_subquery_pathkeys() to handle the case where theTom Lane
subquery's pathkey is a RelabelType applied to something that appears in the subquery's output; for example where the subquery returns a varchar Var and the sort order is shown as that Var coerced to text. This comes up because varchar doesn't have its own sort operator. Per example from Peter Hardman.
2006-08-12Tweak SPI_cursor_open to allow INSERT/UPDATE/DELETE RETURNING; this wasTom Lane
merely a matter of fixing the error check, since the underlying Portal infrastructure already handles it. This in turn allows these statements to be used in some existing plpgsql and plperl contexts, such as a plpgsql FOR loop. Also, do some marginal code cleanup in places that were being sloppy about distinguishing SELECT from SELECT INTO.
2006-08-12Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests.Tom Lane
plpgsql support to come later. Along the way, convert execMain's SELECT INTO support into a DestReceiver, in order to eliminate some ugly special cases. Jonah Harris and Tom Lane
2006-08-10Fix UNION/INTERSECT/EXCEPT so that when two inputs being merged haveTom Lane
same data type and same typmod, we show that typmod as the output typmod, rather than generic -1. This responds to several complaints over the past few years about UNIONs unexpectedly dropping length or precision info.
2006-08-05Fix inheritance_planner() to delete dummy subplans from its Append planTom Lane
list, when some of the child rels have been excluded by constraint exclusion. This doesn't save a huge amount of time but it'll save some, and it makes the EXPLAIN output look saner. We already did the equivalent thing in set_append_rel_pathlist(), but not here.
2006-08-05Extend relation_excluded_by_constraints() to check for mutuallyTom Lane
contradictory WHERE-clauses applied to a relation. This makes the GUC variable constraint_exclusion rather inappropriately named, but I've refrained for the moment from renaming it. Per example from Martin Lesser.
2006-08-05Teach predicate_refuted_by() how to do proofs involving NOT-clauses.Tom Lane
This doesn't matter too much for ordinary NOTs, since prepqual.c does its best to get rid of those, but it helps with IS NOT TRUE clauses which the rule rewriter likes to insert. Per example from Martin Lesser.
2006-08-04Teach eval_const_expressions to simplify BooleanTest nodes that haveTom Lane
constant input. Seems worth doing because rule rewriter inserts IS NOT TRUE tests into WHERE clauses.
2006-08-02Add support for multi-row VALUES clauses as part of INSERT statementsJoe Conway
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed by the spec. (e.g. similar to a FROM clause subselect). initdb required. Joe Conway and Tom Lane.
2006-07-31Change the relation_open protocol so that we obtain lock on a relationTom Lane
(table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.
2006-07-27Aggregate functions now support multiple input arguments. I also tookTom Lane
the opportunity to treat COUNT(*) as a zero-argument aggregate instead of the old hack that equated it to COUNT(1); this is materially cleaner (no more weird ANYOID cases) and ought to be at least a tiny bit faster. Original patch by Sergey Koposov; review, documentation, simple regression tests, pg_dump and psql support by moi.
2006-07-26Code review for bigint-LIMIT patch. Fix missed planner dependency,Tom Lane
eliminate unnecessary code, force initdb because stored rules change (limit nodes are now supposed to be int8 not int4 expressions). Update comments and error messages, which still all said 'integer'.