Age | Commit message (Collapse) | Author |
|
outer join clauses. Given, say,
... from a left join b on a.a1 = b.b1 where a.a1 = 42;
we'll deduce a clause b.b1 = 42 and then mark the original join clause
redundant (we can't remove it completely for reasons I don't feel like
squeezing into this log entry). However the original implementation of
that wasn't bulletproof, because clause_selectivity() wouldn't honor
this_selec if given nonzero varRelid --- which in practice meant that
it worked as desired *except* when considering index scan quals. Which
resulted in bogus underestimation of the size of the indexscan result for
an inner indexscan in an outer join, and consequently a possibly bad
choice of indexscan vs. bitmap scan. Fix by introducing an explicit test
into clause_selectivity(). Also, to make sure we don't trigger that test
in corner cases, change the convention to be that this_selec > 1, not
this_selec = 1, means it's been marked redundant. Per trouble report from
Scara Maccai.
Back-patch to 8.2, where the problem was introduced.
|
|
AND, OR, or equivalent clauses: if there are too many (more than 100) just
exit without proving anything. This ensures that we don't spend O(N^2) time
trying (and most likely failing) to prove anything about very long IN lists
and similar cases.
Also, install a couple of CHECK_FOR_INTERRUPTS calls to ensure that a long
proof attempt can be interrupted.
Per gripe from Sergey Konoplev.
Back-patch the whole patch to 8.2 and just the CHECK_FOR_INTERRUPTS addition
to 8.1. (The rest of the patch doesn't apply cleanly, and since 8.1 doesn't
show the complained-of behavior anyway, it doesn't seem necessary to work
hard on it.)
|
|
we extended the appendrel mechanism to support UNION ALL optimization. The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.
In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
|
|
parent, not only those with RangeTblRefs. We need them in ExecCheckRTPerms.
Report by Brendan O'Shea. Back-patch to 8.2, where pull_up_simple_union_all
was introduced.
|
|
bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.
I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.
Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.
|
|
subquery output column exactly once left-to-right. Although this is the case
in the original parser output, it might not be so after rewriting and
constant-folding, as illustrated by bug #3882 from Jan Mate. Instead
scan the subquery's target list to obtain needed per-column information;
this is duplicative of what the parser did, but only a couple dozen lines
need be copied, and we can clean up a couple of notational uglinesses.
Bug was introduced in 8.2 as part of revision of SubLink representation.
|
|
constraint yields TRUE for every row of its table, only that it does not
yield FALSE (a NULL result isn't disallowed). This breaks a couple of
implications that would be true in two-valued logic. I had put in one such
mistake in an 8.2.5 patch: foo IS NULL doesn't refute a strict operator
on foo. But there was another in the original 8.2 release: NOT foo doesn't
refute an expression whose truth would imply the truth of foo.
Per report from Rajesh Kumar Mallah.
To preserve the ability to do constraint exclusion with one partition
holding NULL values, extend relation_excluded_by_constraints() to check
for attnotnull flags, and add col IS NOT NULL expressions to the set of
constraints we hope to refute.
|
|
clauseless joins of relations that have unexploited join clauses. Rather
than looking at every other base relation in the query, the correct thing is
to examine the other relations in the "initial_rels" list of the current
make_rel_from_joinlist() invocation, because those are what we actually have
the ability to join against. This might be a subset of the whole query in
cases where join_collapse_limit or from_collapse_limit or full joins have
prevented merging the whole query into a single join problem. This is a bit
untidy because we have to pass those rels down through a new PlannerInfo
field, but it's necessary. Per bug #3865 from Oleg Kharin.
|
|
report of poor planning in 8.3: it's unsafe to push a constant across an
outer join when the outer-join condition is delayed by lower outer joins,
unless we recheck the outer-join condition at the upper outer join.
8.2.x doesn't really have the ability to tell whether this is the case
or not, but fortunately it doesn't matter --- it seems most desirable to
keep the join condition whether it's entirely redundant or not. However,
it's usually mostly redundant, so force its selectivity to 1.0.
It might be a good idea to back-patch this into 8.1 as well, but I'll
refrain until/unless there's evidence that 8.1 actually fails on any
cases that this would fix.
|
|
make_greater_string() try harder to generate a string that's actually greater
than its input string. Before we just assumed that making a string that was
memcmp-greater was enough, but it is easy to generate examples where this is
not so when the locale is not C. Instead, loop until the relevant comparison
function agrees that the generated string is greater than the input.
Unfortunately this is probably not enough to guarantee that the generated
string is greater than all extensions of the input, so we cannot relax the
restriction to C locale for the LIKE/regex index optimization. But it should
at least improve the odds of getting a useful selectivity estimate in
prefix_selectivity(). Per example from Guillaume Smet.
Backpatch to 8.1, mainly because that's what the complainant is using...
|
|
if either of the input relations can legally be joined to any other rels using
join clauses. This avoids uselessly (and expensively) considering a lot of
really stupid join paths when there is a join restriction with a large
footprint, that is, lots of relations inside its LHS or RHS. My patch of
15-Feb-2007 had been causing the code to consider joining *every* combination
of rels inside such a group, which is exponentially bad :-(. With this
behavior, clauseless bushy joins will be done if necessary, but they'll be
put off as long as possible. Per report from Jakub Ouhrabka.
Backpatch to 8.2. We might someday want to backpatch to 8.1 as well, but 8.1
does not have the problem for OUTER JOIN nests, only for IN-clauses, so it's
not clear anyone's very likely to hit it in practice; and the current patch
doesn't apply cleanly to 8.1.
|
|
neglected to test whether an outer join's join-condition actually refers to
the lower outer join it is looking at. (The comment correctly described what
was supposed to happen, but the code didn't do it...) This often resulted in
adding an unnecessary constraint on the join order of the two outer joins,
which was bad enough. However, it also seems to expose a performance
problem in an older patch (from 15-Feb): once we've decided that there is a
join ordering constraint, we will start trying clauseless joins between every
combination of rels within the constraint, which pointlessly eats up lots of
time and space if there are numerous rels below the outer join. That probably
needs to be revisited :-(. Per gripe from Jakub Ouhrabka.
|
|
simplification gets detoasted before it is incorporated into a Const node.
Otherwise, if an immutable function were to return a TOAST pointer (an
unlikely case, but it can be made to happen), we would end up with a plan
that depends on the continued existence of the out-of-line toast datum.
|
|
eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway. The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error. Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point. Per bug report from Miroslav Sulc.
|
|
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
|
|
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation. The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it. This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one. If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't. Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.
To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with. This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.
Thanks to Richard Harris for the bug report and test case.
|
|
clauses in which one side or the other references both sides of the join
cannot be removed as redundant, because that expression won't have been
constrained below the join. Per report from Sergey Burladyan.
|
|
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by
a strict function. Per example from Dawid Kuroczko.
Backpatch to 8.2 since this is arguably a performance bug.
|
|
a MIN or MAX aggregate call into an indexscan: the initplan is being made at
the current query nesting level and so we shouldn't increment query_level.
Though usually harmless, this mistake could lead to bogus "plan should not
reference subplan's variable" failures on complex queries. Per bug report
from David Sanchez i Gregori.
|
|
|
|
in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins. The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough. Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one. (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)
The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join). Even then no failure will be manifest unless the planner chooses to
rearrange the join order.
Per bug report from Adam Terrey.
|
|
cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join. The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans. This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from
Chuck D.
|
|
more completely. The motivation for having it understand IS NULL at all was
to allow use of "foo IS NULL" as one of the subsets of a partitioning on
"foo", but as reported by Aleksander Kmetec, it wasn't really getting the job
done. Backpatch to 8.2 since this is arguably a performance bug.
|
|
wrong thing when inlining polymorphic SQL functions, because it was using the
function's declared return type where it should have used the actual result
type of the current call. In 8.1 and 8.2 this causes obvious failures even if
you don't have assertions turned on; in 8.0 and 7.4 it would only be a problem
if the inlined expression were used as an input to a function that did
run-time type determination on its inputs. Add a regression test, since this
is evidently an under-tested area.
|
|
competing alternatives for indexes to use in a bitmap scan. The former
coding took estimated selectivity as an overriding factor, causing it to
sometimes choose indexes that were much slower to scan than ones with a
slightly worse selectivity. It was also too narrow-minded about which
combinations of indexes to consider ANDing. The rewrite makes it pay more
attention to index scan cost than selectivity; this seems sane since it's
impossible to have very bad selectivity with low cost, whereas the reverse
isn't true. Also, we now consider each index alone, as well as adding
each index to an AND-group led by each prior index, for a total of about
O(N^2) rather than O(N) combinations considered. This makes the results
much less dependent on the exact order in which the indexes are
considered. It's still a lot cheaper than an O(2^N) exhaustive search.
A prefilter step eliminates all but the cheapest of those indexes using
the same set of WHERE conditions, to keep the effective value of N down in
scenarios where the DBA has created lots of partially-redundant indexes.
|
|
check_sql_fn_retval allows binary-compatibility cases, the expression
extracted from an inline-able SQL function might have a type that is only
binary-compatible with the declared function result type. To avoid possibly
changing the semantics of the expression, we should insert a RelabelType node
in such cases. This has only been shown to have bad consequences in recent
8.1 and up releases, but I suspect there may be failure cases in the older
branches too, so patch it all the way back. Per bug #3116 from Greg Mullane.
Along the way, fix an omission in eval_const_expressions_mutator: it failed
to copy the relabelformat field when processing a RelabelType. No known
observable failures from this, but it definitely isn't intended behavior.
|
|
JOIN quals, just like WHERE quals, even if they reference every one of the
join's relations. Now that we can reorder outer and inner joins, it's
possible for such a qual to end up being assigned to an outer join plan node,
and we mustn't have it treated as a join qual rather than a filter qual for
the node. (If it were, the join could produce null-extended rows that it
shouldn't.) Per bug report from Pelle Johansson.
|
|
be checked at plan levels below the top; namely, we have to allow for Result
nodes inserted just above a nestloop inner indexscan. Should think about
using the general Param mechanism to pass down outer-relation variables, but
for the moment we need a back-patchable solution. Per report from Phil Frost.
|
|
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
|
|
that overlap an outer join's min_righthand but aren't fully contained in it,
to support joining within the RHS after having performed an outer join that
can commute with this one. Aside from the direct fix in make_join_rel(),
fix has_join_restriction() and GEQO's desirable_join() to consider this
possibility. Per report from Ian Harding.
|
|
had stopped working for tables buried inside views or sub-selects. This is
because I had gotten rid of the simplify_jointree() preprocessing step, and
optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical
FromExpr. Per gripe from Bill Howe.
|
|
we should check that the function code returns the claimed result datatype
every time we parse the function for execution. Formerly, for simple
scalar result types we assumed the creation-time check was sufficient, but
this fails if the function selects from a table that's been redefined since
then, and even more obviously fails if check_function_bodies had been OFF.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Security: CVE-2007-0555
|
|
rel->tuples as well as rel->rows, since some estimation functions expect both
to be valid in every baserel. Per report from Dave Dutcher.
|
|
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance
a list of 11 LEFT JOINs with limit 8 now produces something like
((1 2 3 4 5 6 7 8) 9 10 11 12)
instead of
(((1 2 3 4 5 6 7 8) (9)) 10 11 12)
The latter structure is really only required for a FULL JOIN.
Noted while studying an example from Shane Ambler.
|
|
hash joins with the estimated-larger relation on the inside. There are
several cases where doing that makes perfect sense, and in cases where it
doesn't, the regular cost computation really ought to be able to figure that
out. Make some marginal tweaks in said computation to try to get results
approximating reality a bit better. Per an example from Shane Ambler.
Also, fix an oversight in the original patch to add seq_page_cost: the costs
of spilling a hash join to disk should be scaled by seq_page_cost.
|
|
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
(IN-clauses) or amortization of indexscan costs across repeated indexscans
on the inside of a nestloop. In particular:
Fix some logic errors in the estimation for multiple scans induced by a
ScalarArrayOpExpr indexqual.
Include a small cost component in bitmap index scans to reflect the costs of
manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
appearing to have the same cost as a plain indexscan for fetching a single
tuple.
Also add a per-index-scan-startup CPU cost component; while prior releases
were clearly too pessimistic about the cost of repeated indexscans, the
original 8.2 coding allowed the cost of an indexscan to effectively go to zero
if repeated often enough, which is overly optimistic.
Pay some attention to index correlation when estimating costs for a nestloop
inner indexscan: this is significant when the plan fetches multiple heap
tuples per iteration, since high correlation means those tuples are probably
on the same or adjacent heap pages.
|
|
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be
created. The normal heuristic to avoid joins with no joinclause has to be
overridden in that case. Problem is new in 8.2; before that we forced the
outer join order anyway. Per example from Teodor.
|
|
rearrangeable outer joins and the WHERE clause is non-strict and mentions
only nullable-side relations. New bug in 8.2, caused by new logic to allow
rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff
Davis for producing a usable test case.
|
|
a sublink's test expression have the correct vartypmod, rather than defaulting
to -1. There's at least one place where this is important because we're
expecting these Vars to be exactly equal() to those appearing in the subplan
itself. This is a pretty klugy solution --- it would likely be cleaner to
change Param nodes to include a typmod field --- but we can't do that in the
already-released 8.2 branch.
Per bug report from Hubert Fongarnand.
|
|
-O3 or higher (presumably because it inlines more things). Per gripe
from Mark Mielke.
|
|
accurately: we have to distinguish the effects of the join's own ON
clauses from the effects of pushed-down clauses. Failing to do so
was a quick hack long ago, but it's time to be smarter. Per example
from Thomas H.
|
|
node of a SubLink or SubPlan testexpr field. Bug resulted from replacing
the old lefthand/exprs list fields with a simple expression field, and not
remembering that expression_tree_walker is coded to save a few cycles by
recursing directly to self on list fields (on the assumption the walker
isn't interested in List nodes per se). On non-list fields it must of
course call the walker. Possibly that hack isn't worth the risk of more
such bugs, but I'll leave it be for now. Per bug report from James Robinson.
|
|
outer joins. Originally it was only looking for overlap of the righthand
side of a left join, but we have to do it on the lefthand side too.
Per example from Jean-Pierre Pelletier.
|
|
|
|
the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT
NULL is true if all the row's fields are not null. The former coding got
this right for a limited number of cases with IS NULL (ie, those where it
could disassemble a ROW constructor at parse time), but was entirely wrong
for IS NOT NULL. Per report from Teodor.
I desisted from changing the behavior for arrays, since on closer inspection
it's not clear that there's any support for that in the SQL spec. This
probably needs more consideration.
|
|
tables in the query compete for cache space, not just the one we are
currently costing an indexscan for. This seems more realistic, and it
definitely will help in examples recently exhibited by Stefan
Kaltenbrunner. To get the total size of all the tables involved, we must
tweak the handling of 'append relations' a bit --- formerly we looked up
information about the child tables on-the-fly during set_append_rel_pathlist,
but it needs to be done before we start doing any cost estimation, so
push it into the add_base_rels_to_query scan.
|
|
to a relation on the nullable side of an outer join. I had removed
this during the outer join planning rewrite a few months ago ... I think
I intended to put it somewhere else, but forgot ...
|
|
that has parameters is always planned afresh for each Bind command,
treating the parameter values as constants in the planner. This removes
the performance penalty formerly often paid for using out-of-line
parameters --- with this definition, the planner can do constant folding,
LIKE optimization, etc. After a suggestion by Andrew@supernews.
|
|
trivial if it contains either Vars referencing the corresponding subplan
columns, or Consts equaling the corresponding subplan columns. This
lets the planner eliminate the SubqueryScan in some cases generated by
generate_setop_tlist().
|
|
blocking concurrent writes to the table. Greg Stark, with a little help
from Tom Lane.
|