summaryrefslogtreecommitdiff
path: root/src/backend/executor
AgeCommit message (Collapse)Author
2022-09-21meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-19Harmonize parameter names in storage and AM code.Peter Geoghegan
Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-20Remove various duplicated wordsDavid Rowley
Author: Justin Pryzby Discussion: https://postgr.es/m/20220919111000.GW31833@telsasoft.com
2022-09-19Future-proof the recursion inside ExecShutdownNode().Tom Lane
The API contract for planstate_tree_walker() callbacks is that they take a PlanState pointer and a context pointer. Somebody figured they could save a couple lines of code by ignoring that, and passing ExecShutdownNode itself as the walker even though it has but one argument. Somewhat remarkably, we've gotten away with that so far. However, it seems clear that the upcoming C2x standard means to forbid such cases, and compilers that actively break such code likely won't be far behind. So spend the extra few lines of code to do it honestly with a separate walker function. In HEAD, we might as well go further and remove ExecShutdownNode's useless return value. I left that as-is in back branches though, to forestall complaints about ABI breakage. Back-patch, with the thought that this might become of practical importance before our stable branches are all out of service. It doesn't seem to be fixing any live bug on any currently known platform, however. Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
2022-09-12Assorted examples of expanded type-safer palloc/pg_malloc APIPeter Eisentraut
This adds some uses of the new palloc/pg_malloc variants here and there as a demonstration and test. This is kept separate from the actual API patch, since the latter might be backpatched at some point. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-01Revert SQL/JSON featuresAndrew Dunstan
The reverts the following and makes some associated cleanups: commit f79b803dc: Common SQL/JSON clauses commit f4fb45d15: SQL/JSON constructors commit 5f0adec25: Make STRING an unreserved_keyword. commit 33a377608: IS JSON predicate commit 1a36bc9db: SQL/JSON query functions commit 606948b05: SQL JSON functions commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR() commit 4e34747c8: JSON_TABLE commit fadb48b00: PLAN clauses for JSON_TABLE commit 2ef6f11b0: Reduce running time of jsonb_sqljson test commit 14d3f24fa: Further improve jsonb_sqljson parallel test commit a6baa4bad: Documentation for SQL/JSON features commit b46bcf7a4: Improve readability of SQL/JSON documentation. commit 112fdb352: Fix finalization for json_objectagg and friends commit fcdb35c32: Fix transformJsonBehavior commit 4cd8717af: Improve a couple of sql/json error messages commit f7a605f63: Small cleanups in SQL/JSON code commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug commit a79153b7a: Claim SQL standard compliance for SQL/JSON features commit a1e7616d6: Rework SQL/JSON documentation commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types. commit 3c633f32b: Only allow returning string types or bytea from json_serialize commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size. The release notes are also adjusted. Backpatch to release 15. Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
2022-08-27Doc: add comment about bug fixed in back branches as of 3f7323cbb.Tom Lane
While the bug I just fixed in the back branches doesn't exist in HEAD, the requirement that MULTIEXPR SubPlans not share output parameters still does. Add a comment to memorialize that, because perhaps it could be an issue again someday. Discussion: https://postgr.es/m/17596-c5357f61427a81dc@postgresql.org
2022-08-26More -Wshadow=compatible-local warning fixesDavid Rowley
In a similar effort to f01592f91, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
2022-08-24Further -Wshadow=compatible-local warning fixesDavid Rowley
These should have been included in 421892a19 as these shadowed variable warnings can also be fixed by adjusting the scope of the shadowed variable to put the declaration for it in an inner scope. This is part of the same effort as f01592f91. By my count, this takes the warning count from 114 down to 106. Author: David Rowley and Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-24Further reduce warnings with -Wshadow=compatible-localDavid Rowley
In a similar effort to f01592f91, here we're targetting fixing the warnings that -Wshadow=compatible-local produces that we can fix by moving a variable to an inner scope to stop that variable from being shadowed by another variable declared somewhere later in the function. All of the warnings being fixed here are changing the scope of variables which are being used as an iterator for a "for" loop. In each instance, the fix happens to be changing the for loop to use the C99 type initialization. Much of this code likely pre-dates our use of C99. Reducing the scope of the outer scoped variable seems like the safest way to fix these. Renaming seems more likely to risk patches using the wrong variable. Reducing the scope is more likely to result in a compilation failure after applying some future patch rather than introducing bugs with it. By my count, this takes the warning count from 129 down to 114. Author: Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-17Avoid using list_length() to test for empty list.Tom Lane
The standard way to check for list emptiness is to compare the List pointer to NIL; our list code goes out of its way to ensure that that is the only representation of an empty list. (An acceptable alternative is a plain boolean test for non-null pointer, but explicit mention of NIL is usually preferable.) Various places didn't get that memo and expressed the condition with list_length(), which might not be so bad except that there were such a variety of ways to check it exactly: equal to zero, less than or equal to zero, less than one, yadda yadda. In the name of code readability, let's standardize all those spellings as "list == NIL" or "list != NIL". (There's probably some microscopic efficiency gain too, though few of these look to be at all performance-critical.) A very small number of cases were left as-is because they seemed more consistent with other adjacent list_length tests that way. Peter Smith, with bikeshedding from a number of us Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com
2022-08-16Fix replica identity check for a partitioned table.Amit Kapila
The current publisher code checks if UPDATE or DELETE can be executed with the replica identity of the table even if it's a partitioned table. We can skip checking the replica identity for partitioned tables because the operations are actually performed on the leaf partitions (not the partitioned table). Reported-by: Brad Nicholson Author: Hou Zhijie Reviewed-by: Peter Smith, Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAMMnM%3D8i5DohH%3DYKzV0_wYuYSYvuOJoL9F5nzXTc%2ByzsG1f6rg%40mail.gmail.com
2022-08-13Avoid misbehavior when hash_table_bytes < bucket_size.Tom Lane
It's possible to reach this case when work_mem is very small and tupsize is (relatively) very large. In that case ExecChooseHashTableSize would get an assertion failure, or with asserts off it'd compute nbuckets = 0, which'd likely cause misbehavior later (I've not checked). To fix, clamp the number of buckets to be at least 1. This is due to faulty conversion of old my_log2() coding in 28d936031. Back-patch to v13, as that was. Zhang Mingli Discussion: https://postgr.es/m/beb64ca0-91e2-44ac-bf4a-7ea36275ec02@Spark
2022-08-10Fix handling of R/W expanded datums that are passed to SQL functions.Tom Lane
fmgr_sql must make expanded-datum arguments read-only, because it's possible that the function body will pass the argument to more than one callee function. If one of those functions takes the datum's R/W property as license to scribble on it, then later callees will see an unexpected value, leading to wrong answers. From a performance standpoint, it'd be nice to skip this in the common case that the argument value is passed to only one callee. However, detecting that seems fairly hard, and certainly not something that I care to attempt in a back-patched bug fix. Per report from Adam Mackler. This has been broken since we invented expanded datums, so back-patch to all supported branches. Discussion: https://postgr.es/m/WScDU5qfoZ7PB2gXwNqwGGgDPmWzz08VdydcPFLhOwUKZcdWbblbo-0Lku-qhuEiZoXJ82jpiQU4hOjOcrevYEDeoAvz6nR0IU4IHhXnaCA=@mackler.email Discussion: https://postgr.es/m/187436.1660143060@sss.pgh.pa.us
2022-08-04Fix check_exclusion_or_unique_constraint for UNIQUE NULLS NOT DISTINCT.Tom Lane
Adjusting this function was overlooked in commit 94aa7cc5f. The only visible symptom (so far) is that INSERT ... ON CONFLICT could go into an endless loop when inserting a null that has a conflict. Richard Guo and Tom Lane, per bug #17558 from Andrew Kesper Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org
2022-08-04Add CHECK_FOR_INTERRUPTS in ExecInsert's speculative insertion loop.Tom Lane
Ordinarily the functions called in this loop ought to have plenty of CFIs themselves; but we've now seen a case where no such CFI is reached, making the loop uninterruptible. Even though that's from a recently-introduced bug, it seems prudent to install a CFI at the loop level in all branches. Per discussion of bug #17558 from Andrew Kesper (an actual fix for that bug will follow). Discussion: https://postgr.es/m/17558-3f6599ffcf52fd4a@postgresql.org
2022-08-02Improve performance of ORDER BY / DISTINCT aggregatesDavid Rowley
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
2022-08-02Have ExecFindPartition cache the last found partitionDavid Rowley
Here we add code which detects when ExecFindPartition() continually finds the same partition and add a caching layer to improve partition lookup performance for such cases. Both RANGE and LIST partitioned tables traditionally require a binary search for the set of Datums that a partition needs to be found for. This binary search is commonly visible in profiles when bulk loading into a partitioned table. Here we aim to reduce the overhead of bulk-loading into partitioned tables for cases where many consecutive tuples belong to the same partition and make the performance of this operation closer to what it is with a traditional non-partitioned table. When we find the same partition 16 times in a row, the next search will result in us simply just checking if the current set of values belongs to the last found partition. For LIST partitioning we record the index into the PartitionBoundInfo's datum array. This allows us to check if the current Datum is the same as the Datum that was last looked up. This means if any given LIST partition supports storing multiple different Datum values, then the caching only works when we find the same value as we did the last time. For RANGE partitioning we simply check if the given Datums are in the same range as the previously found partition. We store the details of the cached partition in PartitionDesc (i.e. relcache) so that the cached values are maintained over multiple statements. No caching is done for HASH partitions. The majority of the cost in HASH partition lookups are in the hashing function(s), which would also have to be executed if we were to try to do caching for HASH partitioned tables. Since most of the cost is already incurred, we just don't bother. We also don't do any caching for LIST partitions when we continually find the values being looked up belong to the DEFAULT partition. We've no corresponding index in the PartitionBoundInfo's datum array for this case. We also don't cache when we find the given values match to a LIST partitioned table's NULL partition. This is so cheap that there's no point in doing any caching for this. We also don't cache for a RANGE partitioned table's DEFAULT partition. There have been a number of different patches submitted to improve partition lookups. Hou, Zhijie submitted a patch to detect when the value belonging to the partition key column(s) were constant and added code to cache the partition in that case. Amit Langote then implemented an idea suggested by me to remember the last found partition and start to check if the current values work for that partition. The final patch here was written by me and was done by taking many of the ideas I liked from the patches in the thread and redesigning other aspects. Discussion: https://postgr.es/m/OS0PR01MB571649B27E912EA6CC4EEF03942D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com Author: Amit Langote, Hou Zhijie, David Rowley Reviewed-by: Amit Langote, Hou Zhijie
2022-07-18Add another SQL/JSON error codePeter Eisentraut
A code comment said that the standard does not define a number for ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE, but this was fixed in a later draft version of the standard, so use that number now.
2022-07-13Remove useless assertionsPeter Eisentraut
We don't need Assert(IsA(foo, String)) right before running strVal(foo), since strVal() already does the assertion internally (via castNode()).
2022-07-07Remove stray references to lefttree/righttree in the executor.Tom Lane
The general convention in the executor is to refer to child plans and planstates via the outerPlan[State] and innerPlan[State] macros, but a few places didn't do it like that. For consistency and readability, convert all the stragglers to use the macros. (See also commit 40f42d2a3, which did some similar cleanup a few years ago, but missed these cases.) Richard Guo Discussion: https://postgr.es/m/CAMbWs4-vYhh1xsa_veah4PUed2Xq=Ed_YH3=Mqt5A3Y=EgfCEg@mail.gmail.com
2022-07-06Remove size increase in ExprEvalStep caused by hashed saopsDavid Rowley
50e17ad28 increased the size of ExprEvalStep from 64 bytes up to 88 bytes. Lots of effort was spent during the development of the current expression evaluation code to make an instance of this struct as small as possible. Making this struct larger than needed reduces CPU cache efficiency during expression evaluation which causes noticeable slowdowns during query execution. In order to reduce the size of the struct, here we remove the fn_addr field. The values from this field can be obtained via fcinfo, just with some extra pointer dereferencing. The extra indirection does not seem to cause any noticeable slowdowns. Various other fields have been moved into the ScalarArrayOpExprHashTable struct. These fields are only used when the ScalarArrayOpExprHashTable pointer has already been dereferenced, so no additional pointer dereferences occur for these. Here we also make hash_fcinfo_data the last field in ScalarArrayOpExprHashTable so that we can avoid a further pointer dereference to get the FunctionCallInfoBaseData. This also saves a call to palloc(). 50e17ad28 was added in 14, but it's too late to adjust the size of the ExprEvalStep in that version, so here we just backpatch to 15, which is currently in beta. Author: Andres Freund, David Rowley Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Backpatch-through: 15
2022-07-05expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.Andres Freund
The new expression step types increased the size of ExprEvalStep by ~4 for all types of expression steps, slowing down expression evaluation noticeably. Move them out of line. There's other issues with these expression steps, but addressing them is largely independent of this aspect. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Backpatch: 15-
2022-07-01Add construct_array_builtin, deconstruct_array_builtinPeter Eisentraut
There were many calls to construct_array() and deconstruct_array() for built-in types, for example, when dealing with system catalog columns. These all hardcoded the type attributes necessary to pass to these functions. To simplify this a bit, add construct_array_builtin(), deconstruct_array_builtin() as wrappers that centralize this hardcoded knowledge. This simplifies many call sites and reduces the amount of hardcoded stuff that is spread around. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/2914356f-9e5f-8c59-2995-5997fc48bcba%40enterprisedb.com
2022-06-10Un-break whole-row Vars referencing domain-over-composite types.Tom Lane
In commit ec62cb0aa, I foolishly replaced ExecEvalWholeRowVar's lookup_rowtype_tupdesc_domain call with just lookup_rowtype_tupdesc, because I didn't see how a domain could be involved there, and there were no regression test cases to jog my memory. But the existing code was correct, so revert that change and add a test case showing why it's necessary. (Note: per comment in struct DatumTupleFields, it is correct to produce an output tuple that's labeled with the base composite type, not the domain; hence just blindly looking through the domain is correct here.) Per bug #17515 from Dan Kubb. Back-patch to v11 where domains over composites became a thing. Discussion: https://postgr.es/m/17515-a24737438363aca0@postgresql.org
2022-06-08Harden Memoization code against broken data typesDavid Rowley
Bug #17512 highlighted that a suitably broken data type could cause the backend to crash if either the hash function or equality function were in someway non-deterministic based on their input values. Such a data type could cause a crash of the backend due to some code which assumes that we'll always find a hash table entry corresponding to an item in the Memoize LRU list. Here we remove the assumption that we'll always find the entry corresponding to the given LRU list item and add run-time checks to verify we have found the given item in the cache. This is not a fix for bug #17512, but it will turn the crash reported by that bug report into an internal ERROR. Reported-by: Ales Zeleny Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvpxFSTwvoYWT7kmFVSZ9zLAeHb=S9vrz=RExMgSkQNWqw@mail.gmail.com Backpatch-through: 14, where Memoize was added.
2022-05-21Avoid overflow hazard when clamping group counts to "long int".Tom Lane
Several places in the planner tried to clamp a double value to fit in a "long" by doing (long) Min(x, (double) LONG_MAX); This is subtly incorrect, because it casts LONG_MAX to double and potentially back again. If long is 64 bits then the double value is inexact, and the platform might round it up to LONG_MAX+1 resulting in an overflow and an undesirably negative output. While it's not hard to rewrite the expression into a safe form, let's put it into a common function to reduce the risk of someone doing it wrong in future. In principle this is a bug fix, but since the problem could only manifest with group count estimates exceeding 2^63, it seems unlikely that anyone has actually hit this or will do so anytime soon. We're fixing it mainly to satisfy fuzzer-type tools. That being the case, a HEAD-only fix seems sufficient. Andrey Lepikhov Discussion: https://postgr.es/m/ebbc2efb-7ef9-bf2f-1ada-d6ec48f70e58@postgrespro.ru
2022-05-13Clean up newlines following left parenthesesAlvaro Herrera
Like commit c9d297751959.
2022-05-13Rename JsonIsPredicate.value_type, fix JSON backend/nodes/ infrastructure.Tom Lane
I started out with the intention to rename value_type to item_type to avoid a collision with a typedef name that appears on some platforms. Along the way, I noticed that the adjacent field "format" was not being correctly handled by the backend/nodes/ infrastructure functions: copyfuncs.c erroneously treated it as a scalar, while equalfuncs, outfuncs, and readfuncs omitted handling it at all. This looks like it might be cosmetic at the moment because the field is always NULL after parse analysis; but that's likely a bug in itself, and the code's certainly not very future-proof. Let's fix it while we can still do so without forcing an initdb on beta testers. Further study found a few other inconsistencies in the backend/nodes/ infrastructure for the recently-added JSON node types, so fix those too. catversion bumped because of potential change in stored rules. Discussion: https://postgr.es/m/526703.1652385613@sss.pgh.pa.us
2022-05-12Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
2022-05-11Fix typos and grammar in code and test commentsMichael Paquier
This fixes the grammar of some comments in a couple of tests (SQL and TAP), and in some C files. Author: Justin Pryzby Discussion: https://postgr.es/m/20220511020334.GH19626@telsasoft.com
2022-04-27Fix incorrect format placeholdersPeter Eisentraut
2022-04-20Move ModifyTableContext->lockmode to UpdateContextAlvaro Herrera
Should have been done this way to start with, but I failed to notice This way we avoid some pointless initialization, and better contains the variable to exist in the scope where it is really used. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql
2022-04-20ExecModifyTable: use context.planSlot instead of planSlotAlvaro Herrera
There's no reason to keep a separate local variable when we have a place for it elsewhere. This allows to simplify some code. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql
2022-04-13Remove extraneous blank lines before block-closing bracesAlvaro Herrera
These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-12adjust_partition_colnos mustn't be called if not neededAlvaro Herrera
Add an assert to make this very explicit, as well as a code comment. The former should silence Coverity complaining about this. Introduced by 7103ebb7aae8. Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqTTAOzXiYybab+1DQOb3ZUuK99=p_KD+yrRFhcDbd0jg@mail.gmail.com
2022-04-11Fix various typos and spelling mistakes in code commentsDavid Rowley
Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
2022-04-08Track I/O timing for temporary file blocks in EXPLAIN (BUFFERS)Michael Paquier
Previously, the output of EXPLAIN (BUFFERS) option showed only the I/O timing spent reading and writing shared and local buffers. This commit adds on top of that the I/O timing for temporary buffers in the output of EXPLAIN (for spilled external sorts, hashes, materialization. etc). This can be helpful for users in cases where the I/O related to temporary buffers is the bottleneck. Like its cousin, this information is available only when track_io_timing is enabled. Playing the patch, this is showing an extra overhead of up to 1% even when using gettimeofday() as implementation for interval timings, which is slightly within the usual range noise still that's measurable. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com
2022-04-08Teach planner and executor about monotonic window funcsDavid Rowley
Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
2022-04-07Revert "Rewrite some RI code to avoid using SPI"Alvaro Herrera
This reverts commit 99392cdd78b788295e52b9f4942fa11992fd5ba9. We'd rather rewrite ri_triggers.c as a whole rather than piecemeal. Discussion: https://postgr.es/m/E1ncXX2-000mFt-Pe@gemulon.postgresql.org
2022-04-07Rewrite some RI code to avoid using SPIAlvaro Herrera
Modify the subroutines called by RI trigger functions that want to check if a given referenced value exists in the referenced relation to simply scan the foreign key constraint's unique index, instead of using SPI to execute SELECT 1 FROM referenced_relation WHERE ref_key = $1 This saves a lot of work, especially when inserting into or updating a referencing relation. This rewrite allows to fix a PK row visibility bug caused by a partition descriptor hack which requires ActiveSnapshot to be set to come up with the correct set of partitions for the RI query running under REPEATABLE READ isolation. We now set that snapshot indepedently of the snapshot to be used by the PK index scan, so the two no longer interfere. The buggy output in src/test/isolation/expected/fk-snapshot.out of the relevant test case added by commit 00cb86e75d6d has been corrected. (The bug still exists in branch 14, however, but this fix is too invasive to backpatch.) Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Li Japin <japinli@hotmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
2022-04-07Revert "Logical decoding of sequences"Tomas Vondra
This reverts a sequence of commits, implementing features related to logical decoding and replication of sequences: - 0da92dc530c9251735fc70b20cd004d9630a1266 - 80901b32913ffa59bf157a4d88284b2b3a7511d9 - b779d7d8fdae088d70da5ed9fcd8205035676df3 - d5ed9da41d96988d905b49bebb273a9b2d6e2915 - a180c2b34de0989269fdb819bff241a249bf5380 - 75b1521dae1ff1fde17fda2e30e591f2e5d64b6a - 2d2232933b02d9396113662e44dca5f120d6830e - 002c9dd97a0c874fd1693a570383e2dd38cd40d5 - 05843b1aa49df2ecc9b97c693b755bd1b6f856a9 The implementation has issues, mostly due to combining transactional and non-transactional behavior of sequences. It's not clear how this could be fixed, but it'll require reworking significant part of the patch. Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com
2022-04-05Refactor and cleanup runtime partition prune code a littleAlvaro Herrera
* Move the execution pruning initialization steps that are common between both ExecInitAppend() and ExecInitMergeAppend() into a new function ExecInitPartitionPruning() defined in execPartition.c. Those steps include creation of a PartitionPruneState to be used for all instances of pruning and determining the minimal set of child subplans that need to be initialized by performing initial pruning if needed, and finally adjusting the subplan_map arrays in the PartitionPruneState to reflect the new set of subplans remaining after initial pruning if it was indeed performed. ExecCreatePartitionPruneState() is no longer exported out of execPartition.c and has been renamed to CreatePartitionPruneState() as a local sub-routine of ExecInitPartitionPruning(). * Likewise, ExecFindInitialMatchingSubPlans() that was in charge of performing initial pruning no longer needs to be exported. In fact, since it would now have the same body as the more generally named ExecFindMatchingSubPlans(), except differing in the value of initial_prune passed to the common subroutine find_matching_subplans_recurse(), it seems better to remove it and add an initial_prune argument to ExecFindMatchingSubPlans(). * Add an ExprContext field to PartitionPruneContext to remove the implicit assumption in the runtime pruning code that the ExprContext to use to compute pruning expressions that need one can always rely on the PlanState providing it. A future patch will allow runtime pruning (at least the initial pruning steps) to be performed without the corresponding PlanState yet having been created, so this will help. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
2022-04-04JSON_TABLEAndrew Dunstan
This feature allows jsonb data to be treated as a table and thus used in a FROM clause like other tabular data. Data can be selected from the jsonb using jsonpath expressions, and hoisted out of nested structures in the jsonb to form multiple rows, more or less like an outer join. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu (whose name I previously misspelled), Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
2022-04-04Use Generation memory contexts to store tuples in sortsDavid Rowley
The general usage pattern when we store tuples in tuplesort.c is that we store a series of tuples one by one then either perform a sort or spill them to disk. In the common case, there is no pfreeing of already stored tuples. For the common case since we do not individually pfree tuples, we have very little need for aset.c memory allocation behavior which maintains freelists and always rounds allocation sizes up to the next power of 2 size. Here we conditionally use generation.c contexts for storing tuples in tuplesort.c when the sort will never be bounded. Unfortunately, the memory context to store tuples is already created by the time any calls would be made to tuplesort_set_bound(), so here we add a new sort option that allows callers to specify if they're going to need a bounded sort or not. We'll use a standard aset.c allocator when this sort option is not set. Extension authors must ensure that the TUPLESORT_ALLOWBOUNDED flag is used when calling tuplesort_begin_* for any sorts that make a call to tuplesort_set_bound(). Author: David Rowley Reviewed-by: Andy Fan Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com
2022-04-04Adjust tuplesort API to have bitwise option flagsDavid Rowley
This replaces the bool flag for randomAccess. An upcoming patch requires adding another option, so instead of breaking the API for that, then breaking it again one day if we add more options, let's just break it once. Any boolean options we add in the future will just make use of an unused bit in the flags. Any extensions making use of tuplesorts will need to update their code to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess. TUPLESORT_NONE can be used for a set of empty options. Author: David Rowley Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com
2022-03-30SQL JSON functionsAndrew Dunstan
This Patch introduces three SQL standard JSON functions: JSON() (incorrectly mentioned in my commit message for f4fb45d15c) JSON_SCALAR() JSON_SERIALIZE() JSON() produces json values from text, bytea, json or jsonb values, and has facilitites for handling duplicate keys. JSON_SCALAR() produces a json value from any scalar sql value, including json and jsonb. JSON_SERIALIZE() produces text or bytea from input which containis or represents json or jsonb; For the most part these functions don't add any significant new capabilities, but they will be of use to users wanting standard compliant JSON handling. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-29SQL/JSON query functionsAndrew Dunstan
This introduces the SQL/JSON functions for querying JSON data using jsonpath expressions. The functions are: JSON_EXISTS() JSON_QUERY() JSON_VALUE() All of these functions only operate on jsonb. The workaround for now is to cast the argument to jsonb. JSON_EXISTS() tests if the jsonpath expression applied to the jsonb value yields any values. JSON_VALUE() must return a single value, and an error occurs if it tries to return multiple values. JSON_QUERY() must return a json object or array, and there are various WRAPPER options for handling scalar or multi-value results. Both these functions have options for handling EMPTY and ERROR conditions. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-28IS JSON predicateAndrew Dunstan
This patch intrdocuces the SQL standard IS JSON predicate. It operates on text and bytea values representing JSON as well as on the json and jsonb types. Each test has an IS and IS NOT variant. The tests are: IS JSON [VALUE] IS JSON ARRAY IS JSON OBJECT IS JSON SCALAR IS JSON WITH | WITHOUT UNIQUE KEYS These are mostly self-explanatory, but note that IS JSON WITHOUT UNIQUE KEYS is true whenever IS JSON is true, and IS JSON WITH UNIQUE KEYS is true whenever IS JSON is true except it IS JSON OBJECT is true and there are duplicate keys (which is never the case when applied to jsonb values). Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
2022-03-28Add support for MERGE SQL commandAlvaro Herrera
MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql