summaryrefslogtreecommitdiff
path: root/src/backend/optimizer
AgeCommit message (Collapse)Author
2008-06-27Improve planner's estimation of the size of an append relation: rather thanTom Lane
taking the maximum of any child rel's width, we should weight the widths proportionally to the number of rows expected from each child. In hindsight this is obviously correct because row width is really a proxy for the total physical size of the relation. Per discussion with Scott Carey (bug #4264).
2008-06-19Improve our #include situation by moving pointer types away from theAlvaro Herrera
corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
2008-06-17Fix the code that adds regclass constants to a plan's list of relation OIDsTom Lane
that it depends on for replan-forcing purposes. We need to consider plain OID constants too, because eval_const_expressions folds a RelabelType atop a Const to just a Const. This change could result in OID values that aren't really for tables getting added to the dependency list, but the worst-case consequence would be occasional useless replans. Per report from Gabriele Messineo.
2008-05-27Alter the xxx_pattern_ops opclasses to use the regular equality operator ofTom Lane
the associated datatype as their equality member. This means that these opclasses can now support plain equality comparisons along with LIKE tests, thus avoiding the need for an extra index in some applications. This optimization was not possible when the pattern opclasses were first introduced, because we didn't insist that text equality meant bitwise equality; but we do now, so there is no semantic difference between regular and pattern equality operators. I removed the name_pattern_ops opclass altogether, since it's really useless: name's regular comparisons are just strcmp() and are unlikely to become something different. Instead teach indxpath.c that btree name_ops can be used for LIKE whether or not the locale is C. This might lead to a useful speedup in LIKE queries on the system catalogs in non-C locales. The ~=~ and ~<>~ operators are gone altogether. (It would have been nice to keep them for backward compatibility's sake, but since the pg_amop structure doesn't allow multiple equality operators per opclass, there's no way.) A not-immediately-obvious incompatibility is that the sort order within bpchar_pattern_ops indexes changes --- it had been identical to plain strcmp, but is now trailing-blank-insensitive. This will impact in-place upgrades, if those ever happen. Per discussions a couple months ago.
2008-05-16Extend GIN to support partial-match searches, and extend tsquery to supportTom Lane
prefix matching using this facility. Teodor Sigaev and Oleg Bartunov
2008-05-15Add code to eval_const_expressions() to support const-simplification ofTom Lane
CoerceViaIO nodes. This improves the ability of the planner to deal with cases where the node input is a constant. Per bug #4170.
2008-05-12Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera
unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
2008-05-02Allow the planner's estimate of the fraction of a cursor's rows that will beTom Lane
retrieved to be controlled through a GUC variable. Robert Hell
2008-04-21Fix convert_IN_to_join to properly handle the case where the subselect'sTom Lane
output is not of the same type that's needed for the IN comparison (ie, where the parser inserted an implicit coercion above the subselect result). We should record the coerced expression, not just a raw Var referencing the subselect output, as the quantity that needs to be unique-ified if we choose to implement the IN as Unique followed by a plain join. As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier Hernandez, because the executor was being told to hash or sort the raw subselect output column using operators appropriate to the coerced type. In prior versions there was no crash because the executor chose the hash or sort operators for itself based on the column type it saw. However, that's still not really right, because what's unique for one data type might not be unique for another. In corner cases we could get multiple outputs of a row that should appear only once, as demonstrated by the regression test case included in this commit. However, this patch doesn't apply cleanly to 8.2 or before, and the code involved has shifted enough over time that I'm hesitant to try to back-patch. Given the lack of complaints from the field about such corner cases, I think the bug may not be important enough to risk breaking other things with a back-patch.
2008-04-21Allow float8, int8, and related datatypes to be passed by value on machinesTom Lane
where Datum is 8 bytes wide. Since this will break old-style C functions (those still using version 0 calling convention) that have arguments or results of these types, provide a configure option to disable it and retain the old pass-by-reference behavior. Likewise, provide a configure option to disable the recently-committed float4 pass-by-value change. Zoltan Boszormenyi, plus configurability stuff by me.
2008-04-17Fix a couple of oversights associated with the "physical tlist" optimization:Tom Lane
we had several code paths where a physical tlist could be used for the input to a Sort node, which is a dumb idea because any unneeded table columns will increase the volume of data the sort has to push around. (Unfortunately the easy-looking fix of calling disuse_physical_tlist during make_sort_xxx doesn't work because in most cases we're already committed to the current input tlist --- it's been marked with sort column numbers, or we've built grouping column numbers using it, etc. The tlist has to be selected properly at the calling level before we start constructing sort-col information. This is easy enough to do, we were just failing to take the point into consideration.) Back-patch to 8.3. I believe the problem probably exists clear back to 7.4 when the physical tlist optimization was added, but I'm afraid to back-patch further than 8.3 without a great deal more study than I want to put into it. The code in this area has drifted a lot over time. The real-world importance of these code paths is uncertain anyway --- I think in many cases we'd probably prefer hash-based methods.
2008-04-13Since createplan.c no longer cares whether index operators are lossy, it hasTom Lane
no particular need to do get_op_opfamily_properties() while building an indexscan plan. Postpone that lookup until executor start. This simplifies createplan.c a lot more than it complicates nodeIndexscan.c, and makes things more uniform since we already had to do it that way for RowCompare expressions. Should be a bit faster too, at least for plans that aren't re-used many times, since we avoid palloc'ing and perhaps copying the intermediate list data structure.
2008-04-13Phase 2 of project to make index operator lossiness be determined at runtimeTom Lane
instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.
2008-04-09Small wording improvements for source code READMEs.Bruce Momjian
2008-04-09Revert README cleanups.Bruce Momjian
2008-04-09Revert sentence removal from nickname in FAQ.Bruce Momjian
2008-04-01Fix an oversight I made in a cleanup patch over a year ago:Tom Lane
eval_const_expressions needs to be passed the PlannerInfo ("root") structure, because in some cases we want it to substitute values for Param nodes. (So "constant" is not so constant as all that ...) This mistake partially disabled optimization of unnamed extended-Query statements in 8.3: in particular the LIKE-to-indexscan optimization would never be applied if the LIKE pattern was passed as a parameter, and constraint exclusion depending on a parameter value didn't work either.
2008-03-31Apply my original fix for Taiki Yamaguchi's bug report about DISTINCT MAX().Tom Lane
Add some regression tests for plausible failures in this area.
2008-03-29Revert my erroneous fix for Taiki Yamaguchi's DISTINCT MAX() bug.Tom Lane
Whatever we do about that, this isn't the path to the solution.
2008-03-28Department of second thoughts: the rule that ORDER BY and DISTINCT areTom Lane
useless for an ungrouped-aggregate query holds regardless of whether optimize_minmax_aggregates succeeds. So we might as well apply the optimization in any case. I'll leave 8.3 as it was, since this version is a tad more invasive than my earlier patch.
2008-03-27When we have successfully optimized a MIN or MAX aggregate into an indexscan,Tom Lane
the query result must be exactly one row (since we don't do this when there's any GROUP BY). Therefore any ORDER BY or DISTINCT attached to the query is useless and can be dropped. Aside from saving useless cycles, this protects us against problems with matching the hacked-up tlist entries to sort clauses, as seen in a bug report from Taiki Yamaguchi. We might need to work harder if we ever try to optimize grouped queries with this approach, but this solution will do for now.
2008-03-26Move the HTSU_Result enum definition into snapshot.h, to avoid includingAlvaro Herrera
tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
2008-03-26Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.Alvaro Herrera
Per complaint from Tom Lane.
2008-03-26Separate snapshot management code from tuple visibility code, create aAlvaro Herrera
snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.
2008-03-25Simplify and standardize conversions between TEXT datums and ordinary CTom Lane
strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
2008-03-24When a relation has been proven empty by constraint exclusion, propagate thatTom Lane
knowledge up through any joins it participates in. We were doing that already in some special cases but not in the general case. Also, defend against zero row estimates for the input relations in cost_mergejoin --- this fix may have eliminated the only scenario in which that can happen, but be safe. Per report from Alex Solovey.
2008-03-21More README src cleanups.Bruce Momjian
2008-03-20Make source code READMEs more consistent. Add CVS tags to all README files.Bruce Momjian
2008-03-18Arrange to "inline" SQL functions that appear in a query's FROM clause,Tom Lane
are declared to return set, and consist of just a single SELECT. We can replace the FROM-item with a sub-SELECT and then optimize much as if we were dealing with a view. Patch from Richard Rowell, cleaned up by me.
2008-03-15Change hash index creation so that rather than always establishing exactlyTom Lane
two buckets at the start, we create a number of buckets appropriate for the estimated size of the table. This avoids a lot of expensive bucket-split actions during initial index build on an already-populated table. This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch to reduce hash index build time. I'm committing it separately to make it easier for people to test the effects of this separately from the effects of their other core idea (pre-sorting the index entries by bucket number).
2008-02-19Refactor backend makefiles to remove lots of duplicate codePeter Eisentraut
2008-02-07Fix silly mistake in expand_indexqual_rowcompare --- in converting a forboth()Tom Lane
into an iteration over three parallel lists, I had accidentally put the lnext steps outside the loop. Sigh. Per bug #3938.
2008-01-17Fix subselect.c to avoid assuming that a SubLink's testexpr references eachTom Lane
subquery output column exactly once left-to-right. Although this is the case in the original parser output, it might not be so after rewriting and constant-folding, as illustrated by bug #3882 from Jan Mate. Instead scan the subquery's target list to obtain needed per-column information; this is duplicative of what the parser did, but only a couple dozen lines need be copied, and we can clean up a couple of notational uglinesses. Bug was introduced in 8.2 as part of revision of SubLink representation.
2008-01-12Fix logical errors in constraint exclusion: we cannot assume that a CHECKTom Lane
constraint yields TRUE for every row of its table, only that it does not yield FALSE (a NULL result isn't disallowed). This breaks a couple of implications that would be true in two-valued logic. I had put in one such mistake in an 8.2.5 patch: foo IS NULL doesn't refute a strict operator on foo. But there was another in the original 8.2 release: NOT foo doesn't refute an expression whose truth would imply the truth of foo. Per report from Rajesh Kumar Mallah. To preserve the ability to do constraint exclusion with one partition holding NULL values, extend relation_excluded_by_constraints() to check for attnotnull flags, and add col IS NOT NULL expressions to the set of constraints we hope to refute.
2008-01-11The original implementation of polymorphic aggregates didn't really get theTom Lane
checking of argument compatibility right; although the problem is only exposed with multiple-input aggregates in which some arguments are polymorphic and some are not. Per bug #3852 from Sokolov Yura.
2008-01-11Fix an old error in clause_selectivity: the default selectivity estimateTom Lane
for unhandled clause types ought to be 0.5, not 1.0. I fear I introduced this silliness due to misreading the intent of the very-poorly-structured code that was there when we inherited the file from Berkeley. The lack of sanity in this behavior was exposed by an example from Sim Zacks. (Arguably this is a bug fix and should be back-patched, but I'm a bit hesitant to introduce a possible planner behavior change in the back branches; it might detune queries that worked acceptably in the past.) While at it, make estimation for DistinctExpr do something marginally realistic, rather than just defaulting.
2008-01-11Fix a conceptual error in my patch of 2007-10-26 that avoided consideringTom Lane
clauseless joins of relations that have unexploited join clauses. Rather than looking at every other base relation in the query, the correct thing is to examine the other relations in the "initial_rels" list of the current make_rel_from_joinlist() invocation, because those are what we actually have the ability to join against. This might be a subset of the whole query in cases where join_collapse_limit or from_collapse_limit or full joins have prevented merging the whole query into a single join problem. This is a bit untidy because we have to pass those rels down through a new PlannerInfo field, but it's necessary. Per bug #3865 from Oleg Kharin.
2008-01-09Fix some planner issues found while investigating Kevin Grittner's reportTom Lane
of poorer planning in 8.3 than 8.2: 1. After pushing a constant across an outer join --- ie, given "a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is sort of equal to 42, in the sense that we needn't fetch any b rows where it isn't 42 --- loop to see if any additional deductions can be made. Previous releases did that by recursing, but I had mistakenly thought that this was no longer necessary given the EquivalenceClass machinery. 2. Allow pushing constants across outer join conditions even if the condition is outerjoin_delayed due to a lower outer join. This is safe as long as the condition is strict and we re-test it at the upper join. 3. Keep the outer-join clause even if we successfully push a constant across it. This is *necessary* in the outerjoin_delayed case, but even in the simple case, it seems better to do this to ensure that the join search order heuristics will consider the join as reasonable to make. Mark such a clause as having selectivity 1.0, though, since it's not going to eliminate very many rows after application of the constant condition. 4. Tweak have_relevant_eclass_joinclause to report that two relations are joinable when they have vars that are equated to the same constant. We won't actually generate any joinclause from such an EquivalenceClass, but again it seems that in such a case it's a good idea to consider the join as worth costing out. 5. Fix a bug in select_mergejoin_clauses that was exposed by these changes: we have to reject candidate mergejoin clauses if either side was equated to a constant, because we can't construct a canonical pathkey list for such a clause. This is an implementation restriction that might be worth fixing someday, but it doesn't seem critical to get it done for 8.3.
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-12-08Fix mergejoin cost estimation so that we consider the statistical ranges ofTom Lane
the two join variables at both ends: not only trailing rows that need not be scanned because there cannot be a match on the other side, but initial rows that will be scanned without possibly having a match. This allows a more realistic estimate of startup cost to be made, per recent pgsql-performance discussion. In passing, fix a couple of bugs that had crept into mergejoinscansel: it was not quite up to speed for the task of estimating descending-order scans, which is a new requirement in 8.3.
2007-12-03Fix build_minmax_path() to cope if an IS NULL clause turns up in theTom Lane
indexable-clauses list for a btree index. Formerly it just Asserted that all such clauses were opclauses, but that's no longer true in 8.3. Per bug #3796 from Matthias Schoeneich.
2007-11-24Save another little bit of planner overhead on simple queries, by havingTom Lane
clauselist_selectivity skip some analysis that's useless when there's only one clause in the given list. Actually this can win even for not-so-simple queries, because we also apply clauselist_selectivity to sublists such as the quals matching an index; which are likely to have only a single entry even when the total query is quite complicated.
2007-11-24Change fix_scan_expr() to avoid copying the input node tree in the common caseTom Lane
where rtoffset == 0. In that case there is no need to change Var nodes, and since filling in unset opfuncid fields is always safe, scribbling on the input tree to that extent is not objectionable. This brings the cost of this operation back down to what it was in 8.2 for simple queries. Per investigation of performance gripe from Guillaume Smet.
2007-11-23Avoid uselessly building a duplicate of the original clause in trivial casesTom Lane
where the EquivalenceClass machinery is unable to deduce anything more from a simple "var = const" qual clause. There are probably some more cases where this could be done, but this seems to take care of most of the added overhead for simple queries. Per gripe from Guillaume Smet. In passing, fix a problem that was exposed by this change: reconsider_outer_join_clause and friends were passing the wrong relids to build_implied_join_equality, resulting in RestrictInfos with the wrong required_relids. This mistake was masked in typical cases since the bogus RestrictInfos would never have escaped from the EquivalenceClass machinery, but I think there might be corner cases involving "broken" ECs where there would have been a visible failure even without the new optimization. In any case the code was certainly not operating as intended.
2007-11-22Save one syscache lookup when examining volatility or strictness ofTom Lane
OpExpr and related nodes. We're going to have to set the opfuncid of such nodes eventually (if we haven't already), so we might as well exploit the opportunity to cache the function OID. Buys back some of the extra planner overhead noted by Guillaume Smet, though I still need to fool with equivclass.c to really respond to that.
2007-11-15Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian
avoid this problem in the future.)
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-11-08Fix EquivalenceClass code to handle volatile sort expressions in a moreTom Lane
predictable manner; in particular that if you say ORDER BY output-column-ref, it will in fact sort by that specific column even if there are multiple syntactic matches. An example is SELECT random() AS a, random() AS b FROM ... ORDER BY b, a; While the use-case for this might be a bit debatable, it worked as expected in earlier releases, so we should preserve the behavior for 8.3. Per my recent proposal. While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping in both directions; it needs this for the same reasons make_sort_from_pathkeys does.
2007-11-08Last week's patch for make_sort_from_pathkeys wasn't good enough: it hasTom Lane
to be able to discard top-level RelabelType nodes on *both* sides of the equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo might either add or remove a RelabelType. Also fix the latter to do the removal case cleanly. Per example from Peter.
2007-11-07Improve the performance of LIKE/regex estimation in non-C locales, by makingTom Lane
make_greater_string() try harder to generate a string that's actually greater than its input string. Before we just assumed that making a string that was memcmp-greater was enough, but it is easy to generate examples where this is not so when the locale is not C. Instead, loop until the relevant comparison function agrees that the generated string is greater than the input. Unfortunately this is probably not enough to guarantee that the generated string is greater than all extensions of the input, so we cannot relax the restriction to C locale for the LIKE/regex index optimization. But it should at least improve the odds of getting a useful selectivity estimate in prefix_selectivity(). Per example from Guillaume Smet. Backpatch to 8.1, mainly because that's what the complainant is using...