Age | Commit message (Collapse) | Author |
|
module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan). This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.
Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too. Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
|
|
check_sql_fn_retval allows binary-compatibility cases, the expression
extracted from an inline-able SQL function might have a type that is only
binary-compatible with the declared function result type. To avoid possibly
changing the semantics of the expression, we should insert a RelabelType node
in such cases. This has only been shown to have bad consequences in recent
8.1 and up releases, but I suspect there may be failure cases in the older
branches too, so patch it all the way back. Per bug #3116 from Greg Mullane.
Along the way, fix an omission in eval_const_expressions_mutator: it failed
to copy the relabelformat field when processing a RelabelType. No known
observable failures from this, but it definitely isn't intended behavior.
|
|
parent query's EState. Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c. Should be a tad faster too, although any difference
will probably be hard to measure. This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
|
|
is still needed despite cleanups in setrefs.c, because the point is to
let the inserted Result node compute a different tlist than its input
node does. Per example from Jeremy Drake.
|
|
drill down into subplan targetlists to print the referent expression for an
OUTER or INNER var in an upper plan node. Hence, make it do that always, and
banish the old hack of showing "?columnN?" when things got too complicated.
Along the way, fix an EXPLAIN bug I introduced by suppressing subqueries from
execution-time range tables: get_name_for_var_field() assumed it could look at
rte->subquery to find out the real type of a RECORD var. That doesn't work
anymore, but instead we can look at the input plan of the SubqueryScan plan
node.
|
|
and quals have varno OUTER, rather than zero, to indicate a reference to
an output of their lefttree subplan. This is consistent with the way
that every other upper-level node type does it, and allows some simplifications
in setrefs.c and EXPLAIN.
|
|
useless substructure for its RangeTblEntry nodes. (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)
Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now. It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals. That's been a constant source of bugs, and it's finally gone.
There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
|
|
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime. The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query. This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.
Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.
initdb forced due to change of stored rules.
|
|
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code. Now that we do, we can restructure things
to avoid non-reentrancy. I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
|
|
plan nodes, so that the executor does not need to get these items from
the range table at runtime. This will avoid needing to include these
fields in the compact range table I'm expecting to make the executor use.
|
|
forces a particular relation nonnullable, then we can say that the OR does.
This is worth a little extra trouble since it may allow reduction of
outer joins to plain joins.
|
|
JOIN quals, just like WHERE quals, even if they reference every one of the
join's relations. Now that we can reorder outer and inner joins, it's
possible for such a qual to end up being assigned to an outer join plan node,
and we mustn't have it treated as a join qual rather than a filter qual for
the node. (If it were, the join could produce null-extended rows that it
shouldn't.) Per bug report from Pelle Johansson.
|
|
be checked at plan levels below the top; namely, we have to allow for Result
nodes inserted just above a nestloop inner indexscan. Should think about
using the general Param mechanism to pass down outer-relation variables, but
for the moment we need a back-patchable solution. Per report from Phil Frost.
|
|
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
|
|
that overlap an outer join's min_righthand but aren't fully contained in it,
to support joining within the RHS after having performed an outer join that
can commute with this one. Aside from the direct fix in make_join_rel(),
fix has_join_restriction() and GEQO's desirable_join() to consider this
possibility. Per report from Ian Harding.
|
|
|
|
had stopped working for tables buried inside views or sub-selects. This is
because I had gotten rid of the simplify_jointree() preprocessing step, and
optimize_minmax_aggregates() wasn't smart enough to deal with a non-canonical
FromExpr. Per gripe from Bill Howe.
|
|
that aren't turned into true joins). Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators. The operator family project is now
DONE ... er, except for documentation ...
|
|
we should check that the function code returns the claimed result datatype
every time we parse the function for execution. Formerly, for simple
scalar result types we assumed the creation-time check was sufficient, but
this fails if the function selects from a table that's been redefined since
then, and even more obviously fails if check_function_bodies had been OFF.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Security: CVE-2007-0555
|
|
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
|
|
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
|
|
rel->tuples as well as rel->rows, since some estimation functions expect both
to be valid in every baserel. Per report from Dave Dutcher.
|
|
which I had removed in the first cut of the EquivalenceClass rewrite to
simplify that patch a little. But it's still important --- in a four-way
join problem mergejoinscansel() was eating about 40% of the planning time
according to gprof. Also, improve the EquivalenceClass code to re-use
join RestrictInfos rather than generating fresh ones for each join
considered. This saves some memory space but more importantly improves
the effectiveness of caching planning info in RestrictInfos.
|
|
columns procost and prorows, to allow simple user adjustment of the estimated
cost of a function call, as well as control of the estimated number of rows
returned by a set-returning function. We might eventually wish to extend this
to allow function-specific estimation routines, but there seems to be
consensus that we should try a simple constant estimate first. In particular
this provides a relatively simple way to control the order in which different
WHERE clauses are applied in a plan node, which is a Good Thing in view of the
fact that the recent EquivalenceClass planner rewrite made that much less
predictable than before.
|
|
a couple of syscache lookups in make_pathkey_from_sortinfo().
|
|
provide just a boolean 'amcanorder', instead of fields that specify the
sort operator strategy numbers. We have decided to require ordering-capable
AMs to use btree-compatible strategy numbers, so the old fields are
overkill (and indeed misleading about what's allowed).
|
|
representation of equivalence classes of variables. This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
|
|
|
|
for aggregates. This is OK for current uses but could burn somebody
someday...
|
|
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp). Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use. Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount. Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators. Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
|
|
per-column options for btree indexes. The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options. The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.
Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass. This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
|
|
when collapsing of JOIN trees is stopped by join_collapse_limit. For instance
a list of 11 LEFT JOINs with limit 8 now produces something like
((1 2 3 4 5 6 7 8) 9 10 11 12)
instead of
(((1 2 3 4 5 6 7 8) (9)) 10 11 12)
The latter structure is really only required for a FULL JOIN.
Noted while studying an example from Shane Ambler.
|
|
hash joins with the estimated-larger relation on the inside. There are
several cases where doing that makes perfect sense, and in cases where it
doesn't, the regular cost computation really ought to be able to figure that
out. Make some marginal tweaks in said computation to try to get results
approximating reality a bit better. Per an example from Shane Ambler.
Also, fix an oversight in the original patch to add seq_page_cost: the costs
of spilling a hash join to disk should be scaled by seq_page_cost.
|
|
back-stamped for this.
|
|
predicate operators. The hard stuff turns out to be already done in the
previous commit, we need merely open the floodgates...
|
|
the XmlExpr code in various lists, use a representation that has some hope
of reverse-listing correctly (though it's still a de-escaping function
shy of correctness), generally try to make it look more like Postgres
coding conventions.
|
|
cases. Operator classes now exist within "operator families". While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.
This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later. Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default. I owe some more documentation work, too. But that can all
be done in smaller pieces once this infrastructure is in place.
|
|
|
|
operator strategy numbers, ie, GiST and GIN. This is almost cosmetic
enough to not need a catversion bump, but since the opr_sanity regression
test has to change in sync with the catalog entry, I figured I'd better
do one.
|
|
are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr
(IN-clauses) or amortization of indexscan costs across repeated indexscans
on the inside of a nestloop. In particular:
Fix some logic errors in the estimation for multiple scans induced by a
ScalarArrayOpExpr indexqual.
Include a small cost component in bitmap index scans to reflect the costs of
manipulating the bitmap itself; this is mainly to prevent a bitmap scan from
appearing to have the same cost as a plain indexscan for fetching a single
tuple.
Also add a per-index-scan-startup CPU cost component; while prior releases
were clearly too pessimistic about the cost of repeated indexscans, the
original 8.2 coding allowed the cost of an indexscan to effectively go to zero
if repeated often enough, which is overly optimistic.
Pay some attention to index correlation when estimating costs for a nestloop
inner indexscan: this is significant when the plan fetches multiple heap
tuples per iteration, since high correlation means those tuples are probably
on the same or adjacent heap pages.
|
|
joinclause doesn't use any outer-side vars) requires a "bushy" plan to be
created. The normal heuristic to avoid joins with no joinclause has to be
overridden in that case. Problem is new in 8.2; before that we forced the
outer join order anyway. Per example from Teodor.
|
|
representing externally-supplied values, since the APIs that carry such
values only specify type not typmod. However, for PARAM_SUBLINK Params
it is handy to carry the typmod of the sublink's output column. This
is a much cleaner solution for the recently reported 'could not find
pathkey item to sort' and 'failed to find unique expression in subplan
tlist' bugs than my original 8.2-compatible patch. Besides, someday we
might want to support typmods for external parameters ...
|
|
rearrangeable outer joins and the WHERE clause is non-strict and mentions
only nullable-side relations. New bug in 8.2, caused by new logic to allow
rearranging outer joins. Per bug #2807 from Ross Cohen; thanks to Jeff
Davis for producing a usable test case.
|
|
a sublink's test expression have the correct vartypmod, rather than defaulting
to -1. There's at least one place where this is important because we're
expecting these Vars to be exactly equal() to those appearing in the subplan
itself. This is a pretty klugy solution --- it would likely be cleaner to
change Param nodes to include a typmod field --- but we can't do that in the
already-released 8.2 branch.
Per bug report from Hubert Fongarnand.
|
|
-O3 or higher (presumably because it inlines more things). Per gripe
from Mark Mielke.
|
|
accurately: we have to distinguish the effects of the join's own ON
clauses from the effects of pushed-down clauses. Failing to do so
was a quick hack long ago, but it's time to be smarter. Per example
from Thomas H.
|
|
node of a SubLink or SubPlan testexpr field. Bug resulted from replacing
the old lefthand/exprs list fields with a simple expression field, and not
remembering that expression_tree_walker is coded to save a few cycles by
recursing directly to self on list fields (on the assumption the walker
isn't interested in List nodes per se). On non-list fields it must of
course call the walker. Possibly that hack isn't worth the risk of more
such bugs, but I'll leave it be for now. Per bug report from James Robinson.
|
|
outer joins. Originally it was only looking for overlap of the righthand
side of a left join, but we have to do it on the lefthand side too.
Per example from Jean-Pierre Pelletier.
|
|
|
|
the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT
NULL is true if all the row's fields are not null. The former coding got
this right for a limited number of cases with IS NULL (ie, those where it
could disassemble a ROW constructor at parse time), but was entirely wrong
for IS NOT NULL. Per report from Teodor.
I desisted from changing the behavior for arrays, since on closer inspection
it's not clear that there's any support for that in the SQL spec. This
probably needs more consideration.
|