summaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util
AgeCommit message (Collapse)Author
2022-10-17Guard against table-AM-less relations in planner.Tom Lane
The executor will dump core if it's asked to execute a seqscan on a relation having no table AM, such as a view. While that shouldn't really happen, it's possible to get there via catalog corruption, such as a missing ON SELECT rule. It seems worth installing a defense against that. There are multiple plausible places for such a defense, but I picked the planner's get_relation_info(). Per discussion of bug #17646 from Kui Liu. Back-patch to v12 where the tableam APIs were introduced; in older versions you won't get a SIGSEGV, so it seems less pressing. Discussion: https://postgr.es/m/17646-70c93cfa40365776@postgresql.org
2022-10-05Rename shadowed local variablesDavid Rowley
In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-10-03Revert "Optimize order of GROUP BY keys".Tom Lane
This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. The idea of making a cost-based choice of the order of the sorting columns is not fundamentally unsound, but it requires cost information and data statistics that we don't really have. For example, relying on procost to distinguish the relative costs of different sort comparators is pretty pointless so long as most such comparator functions are labeled with cost 1.0. Moreover, estimating the number of comparisons done by Quicksort requires more than just an estimate of the number of distinct values in the input: you also need some idea of the sizes of the larger groups, if you want an estimate that's good to better than a factor of three or so. That's data that's often unknown or not very reliable. Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. Even if all the inputs are perfectly reliable, the cost algorithm as-implemented cannot offer useful information about how to order sorting columns beyond the point at which the average group size is estimated to drop to 1. Close inspection of the code added by db0d67db2 shows that there are also multiple small bugs. These could have been fixed, but there's not much point if we don't trust the estimates to be accurate in-principle. Finally, the changes in cost_sort's behavior made for very large changes (often a factor of 2 or so) in the cost estimates for all sorting operations, not only those for multi-column GROUP BY. That naturally changes plan choices in many situations, and there's precious little evidence to show that the changes are for the better. Given the above doubts about whether the new estimates are really trustworthy, it's hard to summon much confidence that these changes are better on the average. Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. Note: in v15, I left T_PathKeyInfo in place in nodes.h even though it's unreferenced. Removing it would be an ABI break, and it seems a bit late in the release cycle for that. Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
2022-09-21meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-20Harmonize more parameter names in bulk.Peter Geoghegan
Make sure that function declarations use names that exactly match the corresponding names from function definitions in optimizer, parser, utility, libpq, and "commands" code, as well as in remaining library code. Do the same for all code related to frontend programs (with the exception of pg_dump/pg_dumpall related code). Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will handle ecpg and pg_dump/pg_dumpall. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-15Fix outdated convert_saop_to_hashed_saop commentDavid Rowley
In 29f45e299, we added support for optimizing the execution of NOT IN(values) by using a hash table instead of a linear search over the array. That commit neglected to update the header comment for convert_saop_to_hashed_saop() to mention this fact. Here we fix that. Author: James Coleman Discussion: https://postgr.es/m/CAAaqYe99NUpAPcxgchGstgM23fmiGjqQPot8627YgkBgNt=BfA@mail.gmail.com Backpatch-through: 15, where 29f45e299 was added.
2022-09-01Revert SQL/JSON featuresAndrew Dunstan
The reverts the following and makes some associated cleanups: commit f79b803dc: Common SQL/JSON clauses commit f4fb45d15: SQL/JSON constructors commit 5f0adec25: Make STRING an unreserved_keyword. commit 33a377608: IS JSON predicate commit 1a36bc9db: SQL/JSON query functions commit 606948b05: SQL JSON functions commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR() commit 4e34747c8: JSON_TABLE commit fadb48b00: PLAN clauses for JSON_TABLE commit 2ef6f11b0: Reduce running time of jsonb_sqljson test commit 14d3f24fa: Further improve jsonb_sqljson parallel test commit a6baa4bad: Documentation for SQL/JSON features commit b46bcf7a4: Improve readability of SQL/JSON documentation. commit 112fdb352: Fix finalization for json_objectagg and friends commit fcdb35c32: Fix transformJsonBehavior commit 4cd8717af: Improve a couple of sql/json error messages commit f7a605f63: Small cleanups in SQL/JSON code commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug commit a79153b7a: Claim SQL standard compliance for SQL/JSON features commit a1e7616d6: Rework SQL/JSON documentation commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types. commit 3c633f32b: Only allow returning string types or bytea from json_serialize commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size. The release notes are also adjusted. Backpatch to release 15. Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
2022-08-18Improve performance of adjust_appendrel_attrs_multilevel.Tom Lane
The present implementations of adjust_appendrel_attrs_multilevel and its sibling adjust_child_relids_multilevel are very messy, because they work by reconstructing the relids of the child's immediate parent and then seeing if that's bms_equal to the relids of the target parent. Aside from being quite inefficient, this will not work with planned future changes to make joinrels' relid sets contain outer-join relids in addition to baserels. The whole thing can be solved at a stroke by adding explicit parent and top_parent links to child RelOptInfos, and making these functions work with RelOptInfo pointers instead of relids. Doing that is simpler for most callers, too. In my original version of this patch, I got rid of RelOptInfo.top_parent_relids on the grounds that it was now redundant. However, that adds a lot of code churn in places that otherwise would not need changing, and arguably the extra indirection needed to fetch top_parent->relids in those places costs something. So this version leaves that field in place. Discussion: https://postgr.es/m/553080.1657481916@sss.pgh.pa.us
2022-08-17Refactor addition of PlaceHolderVars to joinrel targetlists.Tom Lane
Make build_joinrel_tlist() responsible for adding PHVs that were already computed in one or the other input relation, and therefore change add_placeholders_to_joinrel() to only add PHVs that will be newly computed in this joinrel's output. This makes the handling of PHVs in build_joinrel_tlist() more like its handling of plain Vars, which seems like a good thing on intelligibility grounds and will simplify planned future changes. There is a purely cosmetic side-effect that the order of entries in the joinrel's tlist may change; but since it becomes more like the order of entries in the input tlists, that's not bad. The reason it wasn't done like this originally was the potential cost of looking up PlaceHolderInfo entries to consult ph_needed. Now that that's O(1) it shouldn't hurt. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
2022-08-17Use an explicit state flag to control PlaceHolderInfo creation.Tom Lane
Up to now, callers of find_placeholder_info() were required to pass a flag indicating if it's OK to make a new PlaceHolderInfo. That'd be fine if the callers had free choice, but they do not. Once we begin deconstruct_jointree() it's no longer OK to make more PHIs; while callers before that always want to create a PHI if it's not there already. So there's no freedom of action, only the opportunity to cause bugs by creating PHIs too late. Let's get rid of that in favor of adding a state flag PlannerInfo.placeholdersFrozen, which we can set at the point where it's no longer OK to make more PHIs. This patch also simplifies a couple of call sites that were using complicated logic to avoid calling find_placeholder_info() as much as possible. Now that that lookup is O(1) thanks to the previous commit, the extra bitmap manipulations are probably a net negative. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
2022-08-17Make PlaceHolderInfo lookup O(1).Tom Lane
Up to now we've just searched the placeholder_list when we want to find the PlaceHolderInfo with a given ID. While there's no evidence of that being a problem in the field, an upcoming patch will add find_placeholder_info() calls in build_joinrel_tlist(), which seems likely to make it more of an issue: a joinrel emitting lots of PlaceHolderVars would incur O(N^2) cost, and we might be building a lot of joinrels in complex queries. Hence, add an array that can be indexed directly by phid to make the lookups constant-time. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
2022-08-03Fix incorrect tests for SRFs in relation_can_be_sorted_early().Tom Lane
Commit fac1b470a thought we could check for set-returning functions by testing only the top-level node in an expression tree. This is wrong in itself, and to make matters worse it encouraged others to make the same mistake, by exporting tlist.c's special-purpose IS_SRF_CALL() as a widely-visible macro. I can't find any evidence that anyone's taken the bait, but it was only a matter of time. Use expression_returns_set() instead, and stuff the IS_SRF_CALL() genie back in its bottle, this time with a warning label. I also added a couple of cross-reference comments. After a fair amount of fooling around, I've despaired of making a robust test case that exposes the bug reliably, so no test case here. (Note that the test case added by fac1b470a is itself broken, in that it doesn't notice if you remove the code change. The repro given by the bug submitter currently doesn't fail either in v15 or HEAD, though I suspect that may indicate an unrelated bug.) Per bug #17564 from Martijn van Oosterhout. Back-patch to v13, as the faulty patch was. Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
2022-07-19Estimate cost of elided SubqueryScan, Append, MergeAppend nodes better.Tom Lane
setrefs.c contains logic to discard no-op SubqueryScan nodes, that is, ones that have no qual to check and copy the input targetlist unchanged. (Formally it's not very nice to be applying such optimizations so late in the planner, but there are practical reasons for it; mostly that we can't unify relids between the subquery and the parent query until we flatten the rangetable during setrefs.c.) This behavior falsifies our previous cost estimates, since we would've charged cpu_tuple_cost per row just to pass data through the node. Most of the time that's little enough to not matter, but there are cases where this effect visibly changes the plan compared to what you would've gotten with no sub-select. To improve the situation, make the callers of cost_subqueryscan tell it whether they think the targetlist is trivial. cost_subqueryscan already has the qual list, so it can check the other half of the condition easily. It could make its own determination of tlist triviality too, but doing so would be repetitive (for callers that may call it several times) or unnecessarily expensive (for callers that can determine this more cheaply than a general test would do). This isn't a 100% solution, because createplan.c also does things that can falsify any earlier estimate of whether the tlist is trivial. However, it fixes nearly all cases in practice, if results for the regression tests are anything to go by. setrefs.c also contains logic to discard no-op Append and MergeAppend nodes. We did have knowledge of that behavior at costing time, but somebody failed to update it when a check on parallel-awareness was added to the setrefs.c logic. Fix that while we're here. These changes result in two minor changes in query plans shown in our regression tests. Neither is relevant to the purposes of its test case AFAICT. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/2581077.1651703520@sss.pgh.pa.us
2022-07-01Remove no-longer-used parameter for create_groupingsets_path().Tom Lane
numGroups is unused since commit b5635948a; let's get rid of it. XueJing Zhao, reviewed by Richard Guo Discussion: https://postgr.es/m/DM6PR05MB64923CC8B63A2CAF3B2E5D47B7AD9@DM6PR05MB6492.namprd05.prod.outlook.com
2022-05-12Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
2022-05-12Make pull_var_clause() handle GroupingFuncs exactly like Aggrefs.Tom Lane
This follows in the footsteps of commit 2591ee8ec by removing one more ill-advised shortcut from planning of GroupingFuncs. It's true that we don't intend to execute the argument expression(s) at runtime, but we still have to process any Vars appearing within them, or we risk failure at setrefs.c time (or more fundamentally, in EXPLAIN trying to print such an expression). Vars in upper plan nodes have to have referents in the next plan level, whether we ever execute 'em or not. Per bug #17479 from Michael J. Sullivan. Back-patch to all supported branches. Richard Guo Discussion: https://postgr.es/m/17479-6260deceaf0ad304@postgresql.org
2022-04-08Teach planner and executor about monotonic window funcsDavid Rowley
Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
2022-03-31Optimize order of GROUP BY keysTomas Vondra
When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. The original patch was proposed by Teodor Sigaev, and later improved and reworked by Dmitry Dolgov. Reviews by a number of people, including me, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu. Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
2022-03-29SQL/JSON query functionsAndrew Dunstan
This introduces the SQL/JSON functions for querying JSON data using jsonpath expressions. The functions are: JSON_EXISTS() JSON_QUERY() JSON_VALUE() All of these functions only operate on jsonb. The workaround for now is to cast the argument to jsonb. JSON_EXISTS() tests if the jsonpath expression applied to the jsonb value yields any values. JSON_VALUE() must return a single value, and an error occurs if it tries to return multiple values. JSON_QUERY() must return a json object or array, and there are various WRAPPER options for handling scalar or multi-value results. Both these functions have options for handling EMPTY and ERROR conditions. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-28Add support for MERGE SQL commandAlvaro Herrera
MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-27SQL/JSON constructorsAndrew Dunstan
This patch introduces the SQL/JSON standard constructors for JSON: JSON() JSON_ARRAY() JSON_ARRAYAGG() JSON_OBJECT() JSON_OBJECTAGG() For the most part these functions provide facilities that mimic existing json/jsonb functions. However, they also offer some useful additional functionality. In addition to text input, the JSON() function accepts bytea input, which it will decode and constuct a json value from. The other functions provide useful options for handling duplicate keys and null values. This series of patches will be followed by a consolidated documentation patch. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-27Common SQL/JSON clausesAndrew Dunstan
This introduces some of the building blocks used by the SQL/JSON constructor and query functions. Specifically, it provides node executor and grammar support for the FORMAT JSON [ENCODING foo] clause, and values decorated with it, and for the RETURNING clause. The following SQL/JSON patches will leverage these. Nikita Glukhov (who probably deserves an award for perseverance). Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-27Remove useless variable.Tom Lane
flatten_join_alias_vars_mutator counted attnums, but didn't do anything with the count (no doubt it did at one time). Noted by buildfarm member seawasp.
2022-03-22Revert "Common SQL/JSON clauses"Andrew Dunstan
This reverts commit 865fe4d5df560a6f5353da652018ff876978ad2d. This has caused issues with a significant number of buildfarm members
2022-03-22Common SQL/JSON clausesAndrew Dunstan
This introduces some of the building blocks used by the SQL/JSON constructor and query functions. Specifically, it provides node executor and grammar support for the FORMAT JSON [ENCODING foo] clause, and values decorated with it, and for the RETURNING clause. The following SQL/JSON patches will leverage these. Nikita Glukhov (who probably deserves an award for perseverance). Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup. Erik Rijkers, Zihong Yu and Himanshu Upadhyaya. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-17Don't bother to attach column name lists to RowExprs of named types.Tom Lane
If a RowExpr is marked as returning a named composite type, we aren't going to consult its colnames list; we'll use the attribute names shown for the type in pg_attribute. Hence, skip storing that list, to save a few nanoseconds when copying the expression tree around. Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-04Parse/analyze function renamingPeter Eisentraut
There are three parallel ways to call parse/analyze: with fixed parameters, with variable parameters, and by supplying your own parser callback. Some of the involved functions were confusingly named and made this API structure more confusing. This patch renames some functions to make this clearer: parse_analyze() -> parse_analyze_fixedparams() pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams() (Otherwise one might think this variant doesn't accept parameters, but in fact all three ways accept parameters.) pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb() (Before, and also when considering pg_analyze_and_rewrite(), one might think this is the only way to pass parameters. Moreover, the parser callback doesn't necessarily need to parse only parameters, it's just one of the things it could do.) parse_fixed_parameters() -> setup_parse_fixed_parameters() parse_variable_parameters() -> setup_parse_variable_parameters() (These functions don't actually do any parsing, they just set up callbacks to use during parsing later.) This patch also adds some const decorations to the fixed-parameters API, so the distinction from the variable-parameters API is more clear. Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
2022-01-25Fix various typos, grammar and code style in comments and docsMichael Paquier
This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-16Add stxdinherit flag to pg_statistic_ext_dataTomas Vondra
Add pg_statistic_ext_data.stxdinherit flag, so that for each extended statistics definition we can store two versions of data - one for the relation alone, one for the whole inheritance tree. This is analogous to pg_statistic.stainherit, but we failed to include such flag in catalogs for extended statistics, and we had to work around it (see commits 859b3003de, 36c4bc6e72 and 20b9fa308e). This changes the relationship between the two catalogs storing extended statistics objects (pg_statistic_ext and pg_statistic_ext_data). Until now, there was a simple 1:1 mapping - for each definition there was one pg_statistic_ext_data row, and this row was inserted while creating the statistics (and then updated during ANALYZE). With the stxdinherit flag, we don't know how many rows there will be (child relations may be added after the statistics object is defined), so there may be up to two rows. We could make CREATE STATISTICS to always create both rows, but that seems wasteful - without partitioning we only need stxdinherit=false rows, and declaratively partitioned tables need only stxdinherit=true. So we no longer initialize pg_statistic_ext_data in CREATE STATISTICS, and instead make that a responsibility of ANALYZE. Which is what we do for regular statistics too. Patch by me, with extensive improvements and fixes by Justin Pryzby. Author: Tomas Vondra, Justin Pryzby Reviewed-by: Tomas Vondra, Justin Pryzby Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-09Make pg_get_expr() more bulletproof.Tom Lane
Since this function is defined to accept pg_node_tree values, it could get applied to any nodetree that can appear in a cataloged pg_node_tree column. Some such cases can't be supported --- for example, its API doesn't allow providing referents for more than one relation --- but we should try to throw a user-facing error rather than an internal error when encountering such a case. In support of this, extend expression_tree_walker/mutator to be sure they'll work on any such node tree (which basically means adding support for relpartbound node types). That allows us to run pull_varnos and check for the case of multiple relations before we start processing the tree. The alternative of changing the low-level error thrown for an out-of-range varno isn't appealing, because that could mask actual bugs in other usages of ruleutils. Per report from Justin Pryzby. This is basically cosmetic, so no back-patch. Discussion: https://postgr.es/m/20211219205422.GT17618@telsasoft.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-12-03Some RELKIND macro refactoringPeter Eisentraut
Add more macros to group some RELKIND_* macros: - RELKIND_HAS_PARTITIONS() - RELKIND_HAS_TABLESPACE() - RELKIND_HAS_TABLE_AM() Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/a574c8f1-9c84-93ad-a9e5-65233d6fc00f%40enterprisedb.com
2021-11-24Flush Memoize cache when non-key parameters change, take 2David Rowley
It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added
2021-11-24Revert "Flush Memoize cache when non-key parameters change"David Rowley
This reverts commit 1050048a315790a505465bfcceb26eaf8dbc7e2e.
2021-11-24Flush Memoize cache when non-key parameters changeDavid Rowley
It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added
2021-11-24Allow Memoize to operate in binary comparison modeDavid Rowley
Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added
2021-11-08Fix incorrect hash equality operator bug in MemoizeDavid Rowley
In v14, because we don't have a field in RestrictInfo to cache both the left and right type's hash equality operator, we just restrict the scope of Memoize to only when the left and right types of a RestrictInfo are the same. In master we add another field to RestrictInfo and cache both hash equality operators. Reported-by: Jaime Casanova Author: David Rowley Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to Backpatch-through: 14
2021-11-02Avoid O(N^2) behavior in SyncPostCheckpoint().Tom Lane
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive list_delete_first() operations, since that would be expensive when there are many files waiting to be unlinked. This is a slightly larger change than in those cases. We have to keep the list state valid for calls to AbsorbSyncRequests(), so it's necessary to invent a "canceled" field instead of immediately deleting PendingUnlinkEntry entries. Also, because we might not be able to process all the entries, we need a new list primitive list_delete_first_n(). list_delete_first_n() is almost list_copy_tail(), but it modifies the input List instead of making a new copy. I found a couple of existing uses of the latter that could profitably use the new function. (There might be more, but the other callers look like they probably shouldn't overwrite the input List.) As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-09-27Fix typos and grammar in code commentsMichael Paquier
Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
2021-09-17Fix pull_varnos to cope with translated PlaceHolderVars.Tom Lane
Commit 55dc86eca changed pull_varnos to use (if possible) the associated ph_eval_at for a PlaceHolderVar. I missed a fine point though: we might be looking at a PHV in the quals or tlist of a child appendrel, in which case we need to compute a ph_eval_at value that's been translated in the same way that the PHV itself has been (cf. adjust_appendrel_attrs). Fortunately, enough info is available in the PlaceHolderInfo to make such translation possible without additional outside data, so we don't need another round of uglification of planner APIs. This is a little bit complicated, but since it's a hard-to-hit corner case, I'm not much worried about adding cycles here. Per report from Jaime Casanova. Back-patch to v12, like the previous commit. Discussion: https://postgr.es/m/20210915230959.GB17635@ahch-to
2021-09-08Clean up some code using "(expr) ? true : false"Michael Paquier
All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
2021-08-31Fix missed lock acquisition while inlining new-style SQL functions.Tom Lane
When starting to use a query parsetree loaded from the catalogs, we must begin by applying AcquireRewriteLocks(), to obtain the same relation locks that the parser would have gotten if the query were entered interactively, and to do some other cleanup such as dealing with later-dropped columns. New-style SQL functions are just as subject to this rule as other stored parsetrees; however, of the places dealing with such functions, only init_sql_fcache had gotten the memo. In particular, if we successfully inlined a new-style set-returning SQL function that contained any relation references, we'd either get an assertion failure or attempt to use those relation(s) sans locks. I also added AcquireRewriteLocks calls to fmgr_sql_validator and print_function_sqlbody. Desultory experiments didn't demonstrate any failures in those, but I suspect that I just didn't try hard enough. Certainly we don't expect nearby code paths to operate without locks. On the same logic of it-ought-to-have-the-same-effects-as-the-old-code, call pg_rewrite_query() in fmgr_sql_validator, too. It's possible that neither code path there needs to bother with rewriting, but doing the analysis to prove that is beyond my goals for today. Per bug #17161 from Alexander Lakhin. Discussion: https://postgr.es/m/17161-048a1cdff8422800@postgresql.org
2021-08-08Change NestPath node to contain JoinPath nodePeter Eisentraut
This makes the structure of all JoinPath-derived nodes the same, independent of whether they have additional fields. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-03Track a Bitmapset of non-pruned partitions in RelOptInfoDavid Rowley
For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
2021-07-25Get rid of artificial restriction on hash table sizes on Windows.Tom Lane
The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-19Use l*_node() family of functions where appropriatePeter Eisentraut
Instead of castNode(…, lfoo(…)) Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/87eecahraj.fsf@wibble.ilmari.org
2021-07-14Copy a Param's location field when replacing it with a Const.Tom Lane
This allows Param substitution to produce just the same result as writing a constant value literally would have done. While it hardly matters so far as the current core code is concerned, extensions might take more interest in node location fields. Julien Rouhaud Discussion: https://postgr.es/m/20170311220932.GJ15188@nol.local
2021-07-14Change the name of the Result Cache node to MemoizeDavid Rowley
"Result Cache" was never a great name for this node, but nobody managed to come up with another name that anyone liked enough. That was until David Johnston mentioned "Node Memoization", which Tom Lane revised to just "Memoize". People seem to like "Memoize", so let's do the rename. Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us Backpatch-through: 14, where Result Cache was introduced
2021-07-07Use a hash table to speed up NOT IN(values)David Rowley
Similar to 50e17ad28, which allowed hash tables to be used for IN clauses with a set of constants, here we add the same feature for NOT IN clauses. NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3. Obviously, if we're using a hash table we must be exactly equivalent to that and return the same result taking into account that either side of the condition could contain a NULL. This requires a little bit of special handling to make work with the hash table version. When processing NOT IN, the ScalarArrayOpExpr's operator will be the <> operator. To be able to build and lookup a hash table we must use the <>'s negator operator. The planner checks if that exists and is hashable and sets the relevant fields in ScalarArrayOpExpr to instruct the executor to use hashing. Author: David Rowley, James Coleman Reviewed-by: James Coleman, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvoF1mum_FRk6D621edcB6KSHBi2+GAgWmioj5AhOu2vwQ@mail.gmail.com
2021-06-10Reconsider the handling of procedure OUT parameters.Tom Lane
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of OUT parameters, for procedures only. While that had some advantages for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty disastrous from a number of other perspectives. Notably, since the primary key of pg_proc is name + proargtypes, this made it possible to have multiple procedures with identical names + input arguments and differing output argument types. That would make it impossible to call any one of the procedures by writing just NULL (or "?", or any other data-type-free notation) for the output argument(s). The change also seems likely to cause grave confusion for client applications that examine pg_proc and expect the traditional definition of proargtypes. Hence, revert the definition of proargtypes to what it was, and undo a number of complications that had been added to support that. To support the SQL-spec behavior of DROP PROCEDURE, when there are no argmode markers in the command's parameter list, we perform the lookup both ways (that is, matching against both proargtypes and proallargtypes), succeeding if we get just one unique match. In principle this could result in ambiguous-function failures that would not happen when using only one of the two rules. However, overloading of procedure names is thought to be a pretty rare usage, so this shouldn't cause many problems in practice. Postgres-specific code such as pg_dump can defend against any possibility of such failures by being careful to specify argmodes for all procedure arguments. This also fixes a few other bugs in the area of CALL statements with named parameters, and improves the documentation a little. catversion bump forced because the representation of procedures with OUT arguments changes. Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us