summaryrefslogtreecommitdiff
path: root/src/backend/optimizer/path
AgeCommit message (Collapse)Author
2023-10-26Add trailing commas to enum definitionsPeter Eisentraut
Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-25Remove useless self-joinsAlexander Korotkov
The Self Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if is proved that the join can be replaced with a scan without impacting the query result. Self join and inner relation are replaced with the outer in query, equivalence classes, and planner info structures. Also, inner restrictlist moves to the outer one with removing duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses === selectivity estimations, and potentially can improve total planner prediction for the query. The SJE proof is based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. Each matched inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x 2. Add to the list above the baseretrictinfo of the inner table. 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables. 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination inner and outer clauses must have an exact match. The relation replacement procedure is not trivial and it is partly combined with the one, used to remove useless left joins. Tests, covering this feature, were added to join.sql. Some regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov, Alexander Kuzmenkov Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina Reviewed-by: Alexander Korotkov
2023-10-10Replace has_multiple_baserels() with a bitmap test on all_baserels.Tom Lane
Since we added the PlannerInfo.all_baserels set, it's not really necessary to grovel over the rangetable to count baserels in the current query. So let's drop has_multiple_baserels() in favor of a bms_membership() test. This might be microscopically faster, but the main point is to remove some unnecessary code. Richard Guo Discussion: https://postgr.es/m/CAMbWs4_8RcSbbfs1ASZLrMuL0c0EQgXWcoLTQD8swBRY_pQQiA@mail.gmail.com
2023-10-10Fix possible crash in add_paths_to_append_rel()David Rowley
While working on a8a968a82, I failed to consider that cheapest_startup_path can be NULL when there is no non-parameterized path in the pathlist. This is well documented in set_cheapest(), I just failed to notice. Here we adjust the code to just check if the RelOptInfo has a cheapest_startup_path set before adding it to the startup_subpaths list. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs49w3t03V69XhdCuw+GDwivny4uQUxrkVp6Gejaspt0wMQ@mail.gmail.com
2023-10-09Remove debug_print_rel and replace usages with pprintDavid Rowley
Going by c4a1933b4, b33ef397a and 05893712c (to name just a few), it seems that maintaining debug_print_rel() is often forgotten. In the case of c4a1933b4, it was several years before anyone noticed that a path type was not handled by debug_print_rel(). (debug_print_rel() is only compiled when building with OPTIMIZER_DEBUG). After a quick survey on the pgsql-hackers mailing list, nobody came forward to admit that they use OPTIMIZER_DEBUG. So to prevent any future maintenance neglect, let's just remove debug_print_rel() and have OPTIMIZER_DEBUG make use of pprint() instead (as suggested by Tom Lane). If anyone wants to come forward to claim they make use of OPTIMIZER_DEBUG in a way that they need debug_print_rel() then they have around 10 months remaining in the v17 cycle where we could revert this. If nobody comes forward in that time, then we can likely safely declare debug_print_rel() as not worth keeping. Discussion: https://postgr.es/m/CAApHDvoCdjo8Cu2zEZF4-AxWG-90S+pYXAnoDDa9J3xH-OrczQ@mail.gmail.com
2023-10-05Consider cheap startup paths in add_paths_to_append_relDavid Rowley
6b94e7a6d did this for ordered append paths to allow fast startup MergeAppends, however, nothing was done for the Append case. Here we adjust add_paths_to_append_rel() to have it build an AppendPath containing the cheapest startup paths from each of the child relations when the append rel has "consider_startup" set. Author: Andy Fan, David Rowley Discussion: https://www.postgresql.org/message-id/CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com
2023-09-29C comment: add optimizer function referenceBruce Momjian
Reported-by: James Coleman Discussion: https://postgr.es/m/CAAaqYe9F6uoMhAr+8rMLwvGzaKaSknPA0Wi3Ehtv8pbSYmJq-Q@mail.gmail.com Backpatch-through: master
2023-09-29Add missing TidRangePath handling in print_path()David Rowley
Tid Range scans were added back in bb437f995. That commit forgot to add handling for TidRangePaths in print_path(). Only people building with OPTIMIZER_DEBUG might have noticed this, which likely is the reason it's taken 4 years for anyone to notice. Author: Andrey Lepikhov Reported-by: Andrey Lepikhov Discussion: https://postgr.es/m/379082d6-1b6a-4cd6-9ecf-7157d8c08635@postgrespro.ru Backpatch-through: 14, where bb437f995 was introduced
2023-09-21Update comment about set_join_pathlist_hook().Etsuro Fujita
The comment introduced by commit e7cb7ee14 was a bit too terse, which could lead to extensions doing different things within the hook function than we intend to allow. Extend the comment to explain what they can do within the hook function. Back-patch to all supported branches. In passing, I rephrased a nearby comment that I recently added to the back branches. Reviewed by David Rowley and Andrei Lepikhov. Discussion: https://postgr.es/m/CAPmGK15SBPA1nr3Aqsdm%2BYyS-ay0Ayo2BRYQ8_A2To9eLqwopQ%40mail.gmail.com
2023-09-07Reorder tests in get_cheapest_path_for_pathkeys().Robert Haas
Checking parallel safety should be even cheaper than cost comparison, so do that first. Also make some minor, related comment improvements. Richard Guo, reviewed by Aleksander Alekseev, Andy Fan, and me. Discussion: http://postgr.es/m/CAMbWs4-KE2wf4QPj_Sr5mX4QFtBNNKGmxK=+e=KZEGUjdG33=g@mail.gmail.com
2023-08-15Re-allow FDWs and custom scan providers to replace joins with pseudoconstant ↵Etsuro Fujita
quals. This was disabled in commit 6f80a8d9c due to the lack of support for handling of pseudoconstant quals assigned to replaced joins in createplan.c. To re-allow it, this patch adds the support by 1) modifying the ForeignPath and CustomPath structs so that if they represent foreign and custom scans replacing a join with a scan, they store the list of RestrictInfo nodes to apply to the join, as in JoinPaths, and by 2) modifying create_scan_plan() in createplan.c so that it uses that list in that case, instead of the baserestrictinfo list, to get pseudoconstant quals assigned to the join, as mentioned in the commit message for that commit. Important item for the release notes: this is non-backwards-compatible since it modifies the ForeignPath and CustomPath structs, as mentioned above, and changes the argument lists for FDW helper functions create_foreignscan_path(), create_foreign_join_path(), and create_foreign_upper_path(). Richard Guo, with some additional changes by me, reviewed by Nishant Sharma, Suraj Kharage, and Richard Guo. Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-08-07Don't Memoize lateral joins with volatile join conditionsDavid Rowley
The use of Memoize was already disabled in normal joins when the join conditions had volatile functions per the code in match_opclause_to_indexcol(). Ordinarily, the parameterization for the inner side of a nested loop will be an Index Scan or at least eventually lead to an index scan (perhaps nested several joins deep). However, for lateral joins, that's not the case and seq scans can be parameterized too, so we can't rely on match_opclause_to_indexcol(). Here we explicitly check the parameterization for volatile functions and don't consider the generation of a Memoize path when such functions are present. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs49nHFnHbpepLsv_yF3qkpCS4BdB-v8HoJVv8_=Oat0u_w@mail.gmail.com Backpatch-through: 14, where Memoize was introduced
2023-08-07Fix misleading comment in paraminfo_get_equal_hashopsDavid Rowley
The comment mistakenly claimed the code was checking PlaceHolderVars for volatile functions when the code was actually checking lateral vars. Update the comment to reflect the reality of the code. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48HZGZOV85g0fx8z1qDx6NNKHexJPT2FCnKnZhxBWkd-A@mail.gmail.com
2023-08-06Tidy up join_search_one_level codeDavid Rowley
The code in join_search_one_level was a bit convoluted. With a casual glance, you might think that other_rels_list was being set to something other than joinrels[1] when level == 2, however, joinrels[level - 1] is joinrels[1] when level == 2, so nothing special needs to happen to set other_rels_list. Let's clean that up to avoid confusing anyone. In passing, we may as well modernize the loop in make_rels_by_clause_joins() and instead of passing in the ListCell to start looping from, let's just pass in the index where to start from and make use of for_each_from(). Ever since 1cff1b95a, Lists are arrays under the hood. lnext() and list_head() both seem a little too linked-list like. Author: Alex Hsieh, David Rowley, Richard Guo Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CANWNU8x9P9aCXGF%3DaT-A_8mLTAT0LkcZ_ySYrGbcuHzMQw2-1g%40mail.gmail.com
2023-08-04Minor adjustments to WindowAgg startup cost codeDavid Rowley
This is a follow-on of 3900a02c9 containing some changes which I forgot to commit locally before forming a patch with git format-patch. Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com
2023-08-04Account for startup rows when costing WindowAggsDavid Rowley
Here we adjust the costs for WindowAggs so that they properly take into account how much of their subnode they must read before outputting the first row. Without this, we always assumed that the startup cost for the WindowAgg was not much more expensive than the startup cost of its subnode, however, that's going to be completely wrong in many cases. The WindowAgg may have to read *all* of its subnode to output a single row with certain window bound options. Here we estimate how many rows we'll need to read from the WindowAgg's subnode and proportionally add more of the subnode's run costs onto the WindowAgg's startup costs according to how much of it we expect to have to read in order to produce the first WindowAgg row. The reason this is more important than we might have initially thought is that we may end up making use of a path from the lower planner that works well as a cheap startup plan when the query has a LIMIT clause, however, the WindowAgg might mean we need to read far more rows than what the LIMIT specifies. No backpatch on this so as not to cause plan changes in released versions. Bug: #17862 Reported-by: Tim Palmer Author: David Rowley Reviewed-by: Andy Fan Discussion: https://postgr.es/m/17862-1ab8f74b0f7b0611@postgresql.org Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com
2023-07-28Disallow replacing joins with scans in problematic cases.Etsuro Fujita
Commit e7cb7ee14, which introduced the infrastructure for FDWs and custom scan providers to replace joins with scans, failed to add support handling of pseudoconstant quals assigned to replaced joins in createplan.c, leading to an incorrect plan without a gating Result node when postgres_fdw replaced a join with such a qual. To fix, we could add the support by 1) modifying the ForeignPath and CustomPath structs to store the list of RestrictInfo nodes to apply to the join, as in JoinPaths, if they represent foreign and custom scans replacing a join with a scan, and by 2) modifying create_scan_plan() in createplan.c to use that list in that case, instead of the baserestrictinfo list, to get pseudoconstant quals assigned to the join; but #1 would cause an ABI break. So fix by modifying the infrastructure to just disallow replacing joins with such quals. Back-patch to all supported branches. Reported by Nishant Sharma. Patch by me, reviewed by Nishant Sharma and Richard Guo. Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-07-22Avoid compiler warning in non-assert builds.Tom Lane
After 3c90dcd03, try_partitionwise_join's child_joinrelids variable is read only in an Assert, provoking a compiler warning in non-assert builds. Rearrange code to avoid the warning and eliminate unnecessary work in the non-assert case. Per CI testing (via Jeff Davis and Bharath Rupireddy) Discussion: https://postgr.es/m/ef0de9713e605451f1b60b30648c5ee900b2394c.camel@j-davis.com
2023-07-21Fix calculation of relid sets for partitionwise child joins.Tom Lane
Applying add_outer_joins_to_relids() to a child join doesn't actually work, even if we've built a SpecialJoinInfo specialized to the child, because that function will also compare the join's relids to elements of the main join_info_list, which only deal in regular relids not child relids. This mistake escaped detection by the existing partitionwise join tests because they didn't test any cases where add_outer_joins_to_relids() needs to add additional OJ relids (that is, any cases where join reordering per identity 3 is possible). Instead, let's apply adjust_child_relids() to the relids of the parent join. This requires minor code reordering to collect the relevant AppendRelInfo structures first, but that's work we'd do shortly anyway. Report and fix by Richard Guo; cosmetic changes by me Discussion: https://postgr.es/m/CAMbWs49NCNbyubZWgci3o=_OTY=snCfAPtMnM-32f3mm-K-Ckw@mail.gmail.com
2023-07-04Allow Incremental Sorts on GiST and SP-GiST indexesDavid Rowley
Previously an "amcanorderbyop" index would only be used when the index could provide sorted results which satisfied all query_pathkeys. Here we relax this so that we also allow these indexes to be considered by the planner when they only provide partially sorted results. This allows the planner to later consider making use of an Incremental Sort to satisfy the remaining pathkeys. This change is particularly useful for KNN-type queries which contain a LIMIT clause and an additional ORDER BY clause for a non-indexed column. Author: Miroslav Bendik Reviewed-by: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAPoEpV0QYDtzjwamwWUBqyWpaCVbJV2d6qOD7Uy09bWn47PJtw%40mail.gmail.com
2023-06-29Defend against bogus parameterization of join input paths.Tom Lane
An outer join cannot be formed using an input path that is parameterized by a value that is supposed to be nulled by the outer join. This is obviously nonsensical, and it could lead to a bad plan being selected; although currently it seems that we'll hit various sanity-check assertions first. I think that such cases were formerly prevented by the delay_upper_joins mechanism, but now that that's gone we need an explicit check. (Perhaps we should avoid generating baserel paths that could lead to this situation in the first place; but it seems like having a defense at the join level would be a good idea anyway.) Richard Guo and Tom Lane, per report from Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5g2uZRrUDZJ8p-=giwcSHVUn0c9nmdxPSY0jF0Ov8VoEA@mail.gmail.com
2023-06-20Centralize fixups for mismatched nullingrels in nestloop params.Tom Lane
It turns out that the fixes we applied in commits bfd332b3f and 63e4f13d2 were not nearly enough to solve the problem. We'd focused narrowly on subquery RTEs with lateral references, but lateral references can occur in several other RTE kinds such as function RTEs. Putting the same hack into half a dozen code paths seems quite unattractive. Hence, revert the code changes (but not the test cases) from those commits and instead solve it centrally in identify_current_nestloop_params(), as Richard proposed originally. This is a bit annoying because it could mask erroneous nullingrels in nestloop params that are generated from non-LATERAL parameterized paths; but on balance I don't see a better way. Maybe at some future time we'll be motivated to find a more rigorous approach to nestloop params, but that's not happening for beta2. Richard Guo and Tom Lane Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com
2023-06-19Don't use partial unique indexes for unique proofs in the plannerDavid Rowley
Here we adjust relation_has_unique_index_for() so that it no longer makes use of partial unique indexes as uniqueness proofs. It is incorrect to use these as the predicates used by check_index_predicates() to set predOK makes use of not only baserestrictinfo quals as proofs, but also qual from join conditions. For relation_has_unique_index_for()'s case, we need to know the relation is unique for a given set of columns before any joins are evaluated, so if predOK was only set to true due to some join qual, then it's unsafe to use such indexes in relation_has_unique_index_for(). The final plan may not even make use of that index, which could result in reading tuples that are not as unique as the planner previously expected them to be. Bug: #17975 Reported-by: Tor Erik Linnerud Backpatch-through: 11, all supported versions Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org
2023-06-13Fix "wrong varnullingrels" for Memoize's lateral references, too.Tom Lane
The issue fixed in commit bfd332b3f can also bite Memoize plans, because of the separate copies of lateral reference Vars made by paraminfo_get_equal_hashops. Apply the same hacky fix there. (In passing, clean up shaky grammar in the existing comments for this function.) Richard Guo Discussion: https://postgr.es/m/CAMbWs4-krwk0Wbd6WdufMAupuou_Ua73ijQ4XQCr1Mb5BaVtKQ@mail.gmail.com
2023-05-25Fix filtering of "cloned" outer-join quals some more.Tom Lane
We've had multiple issues with the clause_is_computable_at logic that I introduced in 2489d76c4: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com
2023-05-19Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-17Fix some issues with improper placement of outer join clauses.Tom Lane
After applying outer-join identity 3 in the forward direction, it was possible for the planner to mistakenly apply a qual clause from above the two outer joins at the now-lower join level. This can give the wrong answer, since a value that would get nulled by the now-upper join might not yet be null. To fix, when we perform such a transformation, consider that the now-lower join hasn't really completed the outer join it's nominally responsible for and thus its relid set should not include that OJ's relid (nor should its output Vars have that nullingrel bit set). Instead we add those bits when the now-upper join is performed. The existing rules for qual placement then suffice to prevent higher qual clauses from dropping below the now-upper join. There are a few complications from needing to consider transitive closures in case multiple pushdowns have happened, but all in all it's not a very complex patch. This is all new logic (from 2489d76c4) so no need to back-patch. The added test cases all have the same results as in v15. Tom Lane and Richard Guo Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru
2023-05-17Add back SQLValueFunction for SQL keywordsMichael Paquier
This is equivalent to a revert of f193883 and fb32748, with the addition that the declaration of the SQLValueFunction node needs to gain a couple of node_attr for query jumbling. The performance impact of removing the function call inlining is proving to be too huge for some workloads where these are used. A worst-case test case of involving only simple SELECT queries with a SQL keyword is proving to lead to a reduction of 10% in TPS via pgbench and prepared queries on a high-end machine. None of the tests I ran back for this set of changes saw such a huge gap, but Alexander Lakhin and Andres Freund have found that this can be noticeable. Keeping the older performance would mean to do more inlining in the executor when using COERCE_SQL_SYNTAX for a function expression, similarly to what SQLValueFunction does. This requires more redesign work and there is little time until 16beta1 is released, so for now reverting the change is the best way forward, bringing back the previous performance. Bump catalog version. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/b32bed1b-0746-9b20-1472-4bdc9ca66d52@gmail.com
2023-04-18Fix some typos and some incorrectly duplicated wordsDavid Rowley
Author: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com
2023-04-05Support "Right Anti Join" plan shapes.Tom Lane
Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com
2023-04-05Remove comment obsoleted by 11c2d6fd.Thomas Munro
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1604497.1680637072%40sss.pgh.pa.us
2023-03-31Parallel Hash Full Join.Thomas Munro
Full and right outer joins were not supported in the initial implementation of Parallel Hash Join because of deadlock hazards (see discussion). Therefore FULL JOIN inhibited parallelism, as the other join strategies can't do that in parallel either. Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on the inner side of one batch's hash table. For now, sidestep the deadlock problem by terminating parallelism there. The last process to arrive at that phase emits the unmatched tuples, while others detach and are free to go and work on other batches, if there are any, but otherwise they finish the join early. That unfairness is considered acceptable for now, because it's better than no parallelism at all. The build and probe phases are run in parallel, and the new scan-for-unmatched phase, while serial, is usually applied to the smaller of the two relations and is either limited by some multiple of work_mem, or it's too big and is partitioned into batches and then the situation is improved by batch-level parallelism. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
2023-03-22Correct Memoize's estimated cache hit ratio calculationDavid Rowley
As demonstrated by David Johnston, the Memoize cache hit ratio calculation wasn't quite correct. This change only affects the estimated hit ratio when the estimated number of entries to cache is estimated not to fit inside the cache. For example, if we expect 2000 distinct cache key values and only expect to be able to cache 1000 of those at once due to memory constraints, with an estimate of 10000 calls, if we could store all entries then the hit ratio should be 80% to account for the first 2000 of the 10000 calls to be a cache miss due to the value not being cached yet. If we can only store 1000 entries for each of the 2000 distinct possible values at once then the 80% should be reduced by half to make the final estimate of 40%. Previously, the calculation would have produced an estimated hit ratio of 30%, which wasn't correct. Apply to master only so as not to destabilize plans in the back branches. Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwZEmcNk3YQo2Xj4EDUOdY6qakad31rOD1Vc4q1_s68-Ew@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrV44LwiF4W_qf_RpbGYWSgp1kF=cZr+kTRRaALUfmXqw@mail.gmail.com
2023-03-20Have the planner account for the Memoize cache key memoryDavid Rowley
The Memoize executor node stores the cache key values along with the tuple(s) which were found in the outer node which match each key value, however, when the planner tried to estimate how many entries could be stored in the cache, it didn't take into account that the cache key must also be stored. In many cases, this won't make a large difference as the key is likely small in comparison to the tuple(s) being stored, however, it's not impossible to craft cases where the key could take more memory than the tuple(s) stored for it. Here we adjust the planner so it takes into account the estimated amount of memory to store the cache key. Effectively, this change will reduce the estimated cache hit ratio when it's thought that not all items will fit in the cache, thus Memoize will become more expensive in such cases. The executor already takes into account the memory consumed by the cache key, so here we only need to adjust the planner. Discussion: https://postgr.es/m/CAApHDvqGErGuyBfQvBQrTCHDbzLTqoiW=_G9sOzeFxWEc_7auA@mail.gmail.com
2023-03-17Fix incorrect logic for determining safe WindowAgg run conditionsDavid Rowley
The logic added in 9d9c02ccd to determine when a qual can be used as a WindowClause run condition failed to correctly check for subqueries in the qual. This was being done correctly for normal subquery qual pushdowns, it's just that 9d9c02ccd failed to follow the lead on that. This also fixes various other cases where transforming the qual into a WindowClause run condition in the subquery should have been disallowed. Bug: #17826 Reported-by: Anban Company Discussion: https://postgr.es/m/17826-7d8750952f19a5f5@postgresql.org Backpatch-through: 15, where 9d9c02ccd was introduced.
2023-03-15Support PlaceHolderVars in MERGE actions.Tom Lane
preprocess_targetlist thought PHVs couldn't appear here. It was mistaken, as per report from Önder Kalacı. Surveying other pull_var_clause calls, I noted no similar errors, but I did notice that qual_is_pushdown_safe's assertion about !contain_window_function was pointless, because the following pull_var_clause call would complain about them anyway. In HEAD only, remove the redundant Assert and improve the commentary. Discussion: https://postgr.es/m/CACawEhUuum-gC_2S3sXLTcsk7bUSPSHOD+g1ZpfKaDK-KKPPWA@mail.gmail.com
2023-03-12Work around implementation restriction in adjust_appendrel_attrs.Tom Lane
adjust_appendrel_attrs can't transfer nullingrel labeling to a non-Var translation expression (mainly because it's too late to wrap such an expression in a PlaceHolderVar). I'd supposed in commit 2489d76c4 that that restriction was unreachable because we'd not attempt to push problematic clauses down to an appendrel child relation. I forgot that set_append_rel_size blindly converts all the parent rel's joininfo clauses to child clauses, and that list could well contain clauses from above a nulling outer join. We might eventually have to devise a direct fix for this implementation restriction, but for now it seems enough to filter out troublesome clauses while constructing the child's joininfo list. Such clauses are certainly not useful while constructing paths for the child rel; they'll have to be applied later when we join the completed appendrel to something else. So we don't need them here, and omitting them from the list should save a few cycles while processing the child rel. Per bug #17832 from Marko Tiikkaja. Discussion: https://postgr.es/m/17832-d0a8106cdf1b722e@postgresql.org
2023-03-02Remove local optimizations of empty Bitmapsets into null pointers.Tom Lane
These are all dead code now that it's done centrally. Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
2023-02-23Fix mis-handling of outer join quals generated by EquivalenceClasses.Tom Lane
It's possible, in admittedly-rather-contrived cases, for an eclass to generate a derived "join" qual that constrains the post-outer-join value(s) of some RHS variable(s) without mentioning the LHS at all. While the mechanisms were set up to work for this, we fell foul of the "get_common_eclass_indexes" filter installed by commit 3373c7155: it could decide that such an eclass wasn't relevant to the join, so that the required qual clause wouldn't get emitted there or anywhere else. To fix, apply get_common_eclass_indexes only at inner joins, where its rule is still valid. At an outer join, fall back to examining all eclasses that mention either input (or the OJ relid, though it should be impossible for an eclass to mention that without mentioning either input). Perhaps we can improve on that later, but the cost/benefit of adding more complexity to skip some irrelevant eclasses is dubious. To allow cheaply distinguishing outer from inner joins, pass the ojrelid to generate_join_implied_equalities as a separate argument. This also allows cleaning up some sloppiness that had crept into the definition of its join_relids argument, and it allows accurate calculation of nominal_join_relids for a child outer join. (The latter oversight seems not to have been a live bug, but it certainly could have caused problems in future.) Also fix what might be a live bug in check_index_predicates: it was being sloppy about what it passed to generate_join_implied_equalities. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-DsTBfOvXuw64GdFss2=M5cwtEhY=0DCS7t2gT7P6hSA@mail.gmail.com
2023-02-22Fix some issues with wrong placement of pseudo-constant quals.Tom Lane
initsplan.c figured that it could push Var-free qual clauses to the top of the current JoinDomain, which is okay in the abstract. But if the current domain is inside some outer join, and we later commute an inside-the-domain outer join with one outside it, we end up placing the pushed-up qual clause incorrectly. In distribute_qual_to_rels, avoid this by using the syntactic scope of the qual clause; with the exception that if we're in the top-level join domain we can still use the full query relid set, ensuring the resulting gating Result node goes to the top of the plan. (This is approximately as smart as the pre-v16 code was. Perhaps we can do better later, but it's not clear that such cases are worth a lot of sweat.) In process_implied_equality, we don't have a clear notion of syntactic scope, but we do have the results of SpecialJoinInfo construction. Thumb through those and remove any lower outer joins that might get commuted to above the join domain. Again, we can make an exception for the top-level join domain. It'd be possible to work harder here (for example, by keeping outer joins that aren't shown as potentially commutable), but I'm going to stop here for the moment. This issue has convinced me that the current representation of join domains probably needs further refinement, so I'm disinclined to write inessential dependent logic just yet. In passing, tighten the qualscope passed to process_implied_equality by generate_base_implied_equalities_no_const; there's no need for it to be larger than the rel we are currently considering. Tom Lane and Richard Guo, per report from Tender Wang. Discussion: https://postgr.es/m/CAHewXNk9eJ35ru5xATWioTV4+xZPHptjy9etdcNPjUfY9RQ+uQ@mail.gmail.com
2023-02-20Optimize generate_orderedappend_pathsDavid Rowley
In generate_orderedappend_paths(), when match_partition_order_desc was true, we would lcons() items to various lists in a loop over each live partition. When the number of live partitions was large, the lcons() could show up in profiles due to it having to perform memmove() to make way for the new list item. Here we adjust things so that we just perform the loop over the live partitions backwards when match_partition_order_desc is true. This allows us to simplify the logic in the loop. Now, as far as the guts of the loop knows, there's no difference between match_partition_order and match_partition_order_desc. We can just set match_partition_order to true so that we build the correct list of paths for the asc and desc case. Per idea from Andres Freund. Discussion: https://postgr.es/m/20230217002351.nyt4y5tdzg6hugdt@awork3.anarazel.de
2023-02-13When removing a relation from the query, drop its RelOptInfo.Tom Lane
In commit b78f6264e I opined that it was "too risky" to delete a relation's RelOptInfo from the planner's data structures when we have realized that we don't need to join to it; so instead we just marked it as a dead relation. In hindsight that judgment seems flawed: any subsequent access to such a dead relation is arguably a bug in itself, so leaving the RelOptInfo present just helps to mask bugs. Let's delete it instead, allowing removal of the whole notion of a "dead relation". So far as the regression tests can find, this requires no other code changes, except for one Assert in equivclass.c that was very dubiously not complaining about access to a dead rel. Discussion: https://postgr.es/m/229905.1676062220@sss.pgh.pa.us
2023-02-13Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.Tom Lane
Late in the development of commit 2489d76c4, I (tgl) incorrectly concluded that the new function have_unsafe_outer_join_ref couldn't ever reach its inner loop. That should be the case if the inner rel's parameterization is based on just one Var, but it could be based on Vars from several relations, and then not only is the inner loop reachable but it's wrongly coded. Despite those errors, it still appears that the whole thing is redundant given previous join_is_legal checks, so let's arrange to only run it in assert-enabled builds. Diagnosis and patch by Richard Guo, per fuzz testing by Justin Pryzby. Discussion: https://postgr.es/m/20230212235823.GW1653@telsasoft.com
2023-02-01Remove dead NoMovementScanDirection codeDavid Rowley
Here remove some dead code from heapgettup() and heapgettup_pagemode() which was trying to support NoMovementScanDirection scans. This code can never be reached as standard_ExecutorRun() never calls ExecutePlan with NoMovementScanDirection. Additionally, plans which were scanning an unordered index would use NoMovementScanDirection rather than ForwardScanDirection. There was no real need for this, so here we adjust this so we use ForwardScanDirection for unordered index scans. A comment in pathnodes.h claimed that NoMovementScanDirection was used for PathKey reasons, but if that was true, it no longer is, per code in build_index_paths(). This does change the non-text format of the EXPLAIN output so that unordered index scans now have a "Forward" scan direction rather than "NoMovement". The text format of EXPLAIN has not changed. Author: Melanie Plageman Reviewed-by: Tom Lane, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-01-30Invent "join domains" to replace the below_outer_join hack.Tom Lane
EquivalenceClasses are now understood as applying within a "join domain", which is a set of inner-joined relations (possibly underneath an outer join). We no longer need to treat an EC from below an outer join as a second-class citizen. I have hopes of eventually being able to treat outer-join clauses via EquivalenceClasses, by means of only applying deductions within the EC's join domain. There are still problems in the way of that, though, so for now the reconsider_outer_join_clause logic is still here. I haven't been able to get rid of RestrictInfo.is_pushed_down either, but I wonder if that could be recast using JoinDomains. I had to hack one test case in postgres_fdw.sql to make it still test what it was meant to, because postgres_fdw is inconsistent about how it deals with quals containing non-shippable expressions; see https://postgr.es/m/1691374.1671659838@sss.pgh.pa.us. That should be improved, but I don't think it's within the scope of this patch series. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-30Do assorted mop-up in the planner.Tom Lane
Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-30Make Vars be outer-join-aware.Tom Lane
Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-18Remove redundant grouping and DISTINCT columns.Tom Lane
Avoid explicitly grouping by columns that we know are redundant for sorting, for example we need group by only one of x and y in SELECT ... WHERE x = y GROUP BY x, y This comes up more often than you might think, as shown by the changes in the regression tests. It's nearly free to detect too, since we are just piggybacking on the existing logic that detects redundant pathkeys. (In some of the existing plans that change, it's visible that a sort step preceding the grouping step already didn't bother to sort by the redundant column, making the old plan a bit silly-looking.) To do this, build processed_groupClause and processed_distinctClause lists that omit any provably-redundant sort items, and consult those not the originals where relevant. This means that within the planner, one should usually consult root->processed_groupClause or root->processed_distinctClause if one wants to know which columns are to be grouped on; but to check whether grouping or distinct-ing is happening at all, check non-NIL-ness of parse->groupClause or parse->distinctClause. This is comparable to longstanding rules about handling the HAVING clause, so I don't think it'll be a huge maintenance problem. nodeAgg.c also needs minor mods, because it's now possible to generate AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-20Add copyright notices to meson filesAndrew Dunstan
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net