summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2019-03-01Store tuples for EvalPlanQual in slots, rather than as HeapTuples.Andres Freund
For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-28Standardize some more loops that chase down parallel lists.Tom Lane
We have forboth() and forthree() macros that simplify iterating through several parallel lists, but not everyplace that could reasonably use those was doing so. Also invent forfour() and forfive() macros to do the same for four or five parallel lists, and use those where applicable. The immediate motivation for doing this is to reduce the number of ad-hoc lnext() calls, to reduce the footprint of a WIP patch. However, it seems like good cleanup and error-proofing anyway; the places that were combining forthree() with a manually iterated loop seem particularly illegible and bug-prone. There was some speculation about restructuring related parsetree representations to reduce the need for parallel list chasing of this sort. Perhaps that's a win, or perhaps not, but in any case it would be considerably more invasive than this patch; and it's not particularly related to my immediate goal of improving the List infrastructure. So I'll leave that question for another day. Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-02-26Use slots in trigger infrastructure, except for the actual invocation.Andres Freund
In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Store table oid and tuple's tid in tuple slots directly.Andres Freund
After the introduction of tuple table slots all table AMs need to support returning the table oid of the tuple stored in a slot created by said AM. It does not make sense to re-implement that in every AM, therefore move handling of table OIDs into the TupleTableSlot structure itself. It's possible that we, at a later date, might want to get rid of HeapTupleData.t_tableOid entirely, but doing so before the abstractions for table AMs are integrated turns out to be too hard, so delay that for now. Similarly, every AM needs to support the concept of a tuple identifier (tid / item pointer) for its tuples. It's quite possible that we'll generalize the exact form of a tid at a future point (to allow for things like index organized tables), but for now many parts of the code know about tids, so there's not much point in abstracting tids away. Therefore also move into slot (rather than providing API to set/get the tid associated with the tuple in a slot). Once table AM includes insert/updating/deleting tuples, the responsibility to set the correct tid after such an action will move into that. After that change, code doing such modifications, should not have to deal with HeapTuples directly anymore. Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Allow to use HeapTupleData embedded in [Buffer]HeapTupleTableSlot.Andres Freund
That avoids having to care about the lifetime of the HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That doesn't make a huge difference for a plain HeapTupleTableSlot, but for BufferHeapTupleTableSlot it can be a significant advantage, avoiding the need to materialize slots where it's inconvenient to provide a HeapTupleData with appropriate lifetime to point to the on-disk tuple. It's quite possible that we'll want to add support functions for constructing HeapTuples using that embedded HeapTupleData, but for now callers do so themselves. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26Add ExecStorePinnedBufferHeapTuple.Andres Freund
This allows to avoid an unnecessary pin/unpin cycle when storing a tuple in an already pinned buffer into a slot, when the pin isn't further needed at the call site. Only a single caller for now (to ensure coverage), but upcoming patches will increase use of the new function. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-25Remove unneeded argument from _bt_getstackbuf().Peter Geoghegan
_bt_getstackbuf() is called at exactly two points following commit efada2b8e92 (one call site is concerned with page splits, while the other is concerned with page deletion). The parent buffer returned by _bt_getstackbuf() is write-locked in both cases. Remove the 'access' argument and make _bt_getstackbuf() assume that callers require a write-lock.
2019-02-25Remove unnecessary use of PROCEDURALPeter Eisentraut
Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword before LANGUAGE. We mostly don't use this anymore, so some of these look a bit old. There is still some use in pg_dump, which is harder to remove because it's baked into the archive format, so I'm not touching that. Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com
2019-02-25Make release of 2PC identifier and locks consistent in COMMIT PREPAREDMichael Paquier
When preparing a transaction in two-phase commit, a dummy PGPROC entry holding the GID used for the transaction is registered, which gets released once COMMIT PREPARED is run. Prior releasing its shared memory state, all the locks taken in the prepared transaction are released using a dedicated set of callbacks (pgstat and multixact having similar callbacks), which may cause the locks to be released before the GID is set free. Hence, there is a small window where lock conflicts could happen, for example: - Transaction A releases its locks, still holding its GID in shared memory. - Transaction B held a lock which conflicted with locks of transaction A. - Transaction B continues its processing, reusing the same GID as transaction A. - Transaction B fails because of a conflicting GID, already in use by transaction A. This commit changes the shared memory state release so as post-commit callbacks and predicate lock cleanup happen consistently with the shared memory state cleanup for the dummy PGPROC entry. The race window is small and 2PC had this issue from the start, so no backpatch is done. On top if that fixes discussed involved ABI breakages, which are not welcome in stable branches. Reported-by: Oleksii Kliukin, Ildar Musin Diagnosed-by: Oleksii Kliukin, Ildar Musin Author: Michael Paquier Reviewed-by: Masahiko Sawada, Oleksii Kliukin Discussion: https://postgr.es/m/BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6@hintbits.com
2019-02-21Move estimate_hashagg_tablesize to selfuncs.c, and widen result to double.Tom Lane
It seems to make more sense for this to be in selfuncs.c, since it's largely a statistical-estimation thing, and it's related to other functions like estimate_hash_bucket_stats that are there. While at it, change the result type from Size to double. Perhaps at one point it was impossible for the result to overflow an integer, but I've got no confidence in that proposition anymore. Nothing's actually done with the result except to compare it to a work_mem-based limit, so as long as we don't get an overflow on the way to that comparison, things should be fine even with very large dNumGroups. Code movement proposed by Antonin Houska, type change by me Discussion: https://postgr.es/m/25767.1549359615@localhost
2019-02-21Move code for managing PartitionDescs into a new file, partdesc.cRobert Haas
This is similar in spirit to the existing partbounds.c file in the same directory, except that there's a lot less code in the new file created by this commit. Pending work in this area proposes to add a bunch more code related to PartitionDescs, though, and this will give us a good place to put it. Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
2019-02-20Use an unsigned char for bool if we don't use the native bool.Andrew Gierth
On (rare) platforms where sizeof(bool) > 1, we need to use our own bool, but imported c99 code (such as Ryu) may want to use bool values as array subscripts, which elicits warnings if bool is defined as char. Using unsigned char instead should work just as well for our purposes, and avoid such warnings. Per buildfarm members prariedog and locust.
2019-02-18Fix typo in transam.h for OIDs assigned by genbki.plMichael Paquier
The actual range of reserved OIDs in this case is [11000,11999] and not [11000,12000]. Author: John Naylor Discussion: https://postgr.es/m/CAJVSVGV5StmK-inxbmrf0nLbBGeaAKnjnqxXmk+4ufeav8JMSA@mail.gmail.com
2019-02-17Mark pg_config() stable rather than immutableJoe Conway
pg_config() has been marked immutable since its inception. As part of a larger discussion around the definition of immutable versus stable and related implications for marking functions parallel safe raised by Andres, the consensus was clearly that pg_config() is stable, since it could possibly change output even for the same minor version with a recompile or installation of a new binary. So mark it stable. Theoretically this could/should be backpatched, but it was deemed to be not worth the effort since in practice this is very unlikely to cause problems in the real world. Discussion: https://postgr.es/m/20181126234521.rh3grz7aavx2ubjv@alap3.anarazel.de
2019-02-16Allow user control of CTE materialization, and change the default behavior.Tom Lane
Historically we've always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate when the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE query is non-recursive and side-effect-free, there's no hazard of changing the query results by pushing restrictions down. Another argument for materialization is that it can avoid duplicate computation of an expensive WITH query --- but that only applies if the WITH query is called more than once in the outer query. Even then it could still be a net loss, if each call has restrictions that would allow just a small part of the WITH query to be computed. Hence, let's change the behavior for WITH queries that are non-recursive and side-effect-free. By default, we will inline them into the outer query (removing the optimization fence) if they are called just once. If they are called more than once, we will keep the old behavior by default, but the user can override this and force inlining by specifying NOT MATERIALIZED. Lastly, the user can force the old behavior by specifying MATERIALIZED; this would mainly be useful when the query had deliberately been employing WITH as an optimization fence to prevent a poor choice of plan. Andreas Karlsson, Andrew Gierth, David Fetter Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk
2019-02-16Fix previous MinGW fix.Andrew Gierth
Definitions required for MinGW need to be outside #if _MSC_VER. Oops.
2019-02-15Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane
Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16Cygwin and Mingw floating-point fixes.Andrew Gierth
Deal with silent-underflow errors in float4 for cygwin and mingw by using our strtof() wrapper; deal with misrounding errors by adding them to the resultmap. Some slight reorganization of declarations was done to avoid duplicating material between cygwin.h and win32_port.h. While here, remove from the resultmap all references to float8-small-is-zero; inspection of cygwin output suggests it's no longer required there, and the freebsd/netbsd/openbsd entries should no longer be necessary (these date back to c. 2000). This commit doesn't remove the file itself nor the documentation references for it; that will happen in a subsequent commit if all goes well.
2019-02-15Revert attempts to use POPCNT etc instructionsAlvaro Herrera
This reverts commits fc6c72747ae6, 109de05cbb03, d0b4663c23b7 and 711bab1e4d19. Somebody will have to try harder before submitting this patch again. I've spent entirely too much time on it already, and the #ifdef maze yet to be written in order for it to build at all got on my nerves. The amount of work needed to get a platform-specific performance improvement that's barely above the noise level is not worth it.
2019-02-15Refactor index cost estimation functions in view of IndexClause changes.Tom Lane
Get rid of deconstruct_indexquals() in favor of just iterating over the IndexClause list directly. The extra services that that function used to provide, such as hiding clause commutation and associating the right index column with each clause, are no longer useful given the new data structure. I'd originally thought that it'd provide a useful amount of abstraction by freeing callers from paying attention to the exact clause type of each indexqual, but that hope proves to have been vain, because few callers can ignore the semantic differences between different clause types. Indeed, removing it results in a net code savings, and probably some cycles shaved by not having to build an extra list-of-structs data structure. Also, export a few formerly-static support functions, with the goal of allowing extension AMs to write functionality equivalent to genericcostestimate() without pointless code duplication. Discussion: https://postgr.es/m/24586.1550106354@sss.pgh.pa.us
2019-02-15Fix compiler builtin usage in new pg_bitutils.cAlvaro Herrera
Split out these new functions in three parts: one in a new file that uses the compiler builtin and gets compiled with the -mpopcnt compiler option if it exists; another one that uses the compiler builtin but not the compiler option; and finally the fallback with open-coded algorithms. Split out the configure logic: in the original commit, it was selecting to use the -mpopcnt compiler switch together with deciding whether to use the compiler builtin, but those two things are really separate. Split them out. Also, expose whether the builtin exists to Makefile.global, so that src/port's Makefile can decide whether to compile the hw-optimized file. Remove CPUID test for CTZ/CLZ. Make pg_{right,left}most_ones use either the compiler intrinsic or open-coded algo; trying to use the HW-optimized version is a waste of time. Make them static inline functions. Discussion: https://postgr.es/m/20190213221719.GA15976@alvherre.pgsql
2019-02-14Simplify the planner's new representation of indexable clauses a little.Tom Lane
In commit 1a8d5afb0, I thought it'd be a good idea to define IndexClause.indexquals as NIL in the most common case where the given clause (IndexClause.rinfo) is usable exactly as-is. It'd be more consistent to define the indexquals in that case as being a one-element list containing IndexClause.rinfo, but I thought saving the palloc overhead for making such a list would be worthwhile. In hindsight, that was a great example of "premature optimization is the root of all evil": it's complicated everyplace that needs to deal with the indexquals, requiring duplicative code to handle both the simple case and the not-simple case. I'd initially found that tolerable but it's getting less so as I mop up some areas that I'd not touched in 1a8d5afb0. In any case, two more pallocs during a planner run are surely at the noise level (a conclusion confirmed by a bit of microbenchmarking). So let's change this decision before it becomes set in stone, and insist that IndexClause.indexquals always be a valid list of the actual index quals for the clause. Discussion: https://postgr.es/m/24586.1550106354@sss.pgh.pa.us
2019-02-14Get rid of another unconstify through API changesPeter Eisentraut
This also makes the code in read_client_first_message() more similar to read_client_final_message(). Reported-by: Mark Dilger <hornschnorter@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
2019-02-14Move pattern selectivity code from selfuncs.c to like_support.c.Tom Lane
While at it, refactor patternsel() a bit so that it can be used from the LIKE/regex planner support functions as well. This makes the planner able to deal equally well with either operator or function syntax for these operations. I'm not excited about that as a feature in itself, but it provides a nice model for extensions to follow if they want such behavior for their operations. This change localizes the use of pattern_fixed_prefix() and make_greater_string() so that they no longer need be exported. (We might get pushback from extensions about that, perhaps, in which case I'd be inclined to re-export them in a new header file like_support.h.) This reduces the bulk of selfuncs.c a fair amount, removing ~1370 lines or about one-sixth of that file; it's still too big, but this is progress. Discussion: https://postgr.es/m/24537.1550093915@sss.pgh.pa.us
2019-02-13Fix portability issues in pg_bitutilsAlvaro Herrera
We were using uint64 function arguments as "long int" arguments to compiler builtins, which fails on machines where long ints are 32 bits: the upper half of the uint64 was being ignored. Fix by using the "ll" builtin variants instead, which on those machines take 64 bit arguments. Also, remove configure tests for __builtin_popcountl() (as well as "long" variants for ctz and clz): the theory here is that any compiler version will provide all widths or none, so one test suffices. Were this theory to be wrong, we'd have to add tests for __builtin_popcountll() and friends, which would be tedious. Per failures in buildfarm member lapwing and ensuing discussion.
2019-02-13Add basic support for using the POPCNT and SSE4.2s LZCNT opcodesAlvaro Herrera
These opcodes have been around in the AMD world since 2007, and 2008 in the case of intel. They're supported in GCC and Clang via some __builtin macros. The opcodes may be unavailable during runtime, in which case we fall back on a C-based implementation of the code. In order to get the POPCNT instruction we must pass the -mpopcnt option to the compiler. We do this only for the pg_bitutils.c file. David Rowley (with fragments taken from a patch by Thomas Munro) Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-13Change floating-point output format for improved performance.Andrew Gierth
Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13Use strtof() and not strtod() for float4 input.Andrew Gierth
Using strtod() creates a double-rounding problem; the input decimal value is first rounded to the nearest double; rounding that to the nearest float may then give an incorrect result. An example is that 7.038531e-26 when input via strtod and then rounded to float4 gives 0xAE43FEp-107 instead of the correct 0xAE43FDp-107. Values output by earlier PG versions with extra_float_digits=3 should all be read in with the same values as previously. However, values supplied by other software using shortest representations could be mis-read. On platforms that lack a strtof() entirely, we fall back to the old incorrect rounding behavior. (As strtof() is required by C99, such platforms are considered of primarily historical interest.) On VS2013, some workarounds are used to get correct error handling. The regression tests now test for the correct input values, so platforms that lack strtof() will need resultmap entries. An entry for HP-UX 10 is included (more may be needed). Reviewed-By: Tom Lane Discussion: https://postgr.es/m/871s5emitx.fsf@news-spur.riddles.org.uk Discussion: https://postgr.es/m/87d0owlqpv.fsf@news-spur.riddles.org.uk
2019-02-11Fix header inclusion issue.Tom Lane
partprune.h failed to compile by itself; needs to include partdefs.h. I think I must've broken this in fa2cf164a, though I'd swear I ran the appropriate tests when removing #includes. Anyway, it's very sensible for this file to include partdefs.h, so let's just do that. Per cpluspluscheck.
2019-02-11Allow extensions to generate lossy index conditions.Tom Lane
For a long time, indxpath.c has had the ability to extract derived (lossy) index conditions from certain operators such as LIKE. For just as long, it's been obvious that we really ought to make that capability available to extensions. This commit finally accomplishes that, by adding another API for planner support functions that lets them create derived index conditions for their functions. As proof of concept, the hardwired "special index operator" code formerly present in indxpath.c is pushed out to planner support functions attached to LIKE and other relevant operators. A weak spot in this design is that an extension needs to know OIDs for the operators, datatypes, and opfamilies involved in the transformation it wants to make. The core-code prototypes use hard-wired OID references but extensions don't have that option for their own operators etc. It's usually possible to look up the required info, but that may be slow and inconvenient. However, improving that situation is a separate task. I want to do some additional refactorization around selfuncs.c, but that also seems like a separate task. Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-02-12Move max_wal_senders out of max_connections for connection slot handlingMichael Paquier
Since its introduction, max_wal_senders is counted as part of max_connections when it comes to define how many connection slots can be used for replication connections with a WAL sender context. This can lead to confusion for some users, as it could be possible to block a base backup or replication from happening because other backend sessions are already taken for other purposes by an application, and superuser-only connection slots are not a correct solution to handle that case. This commit makes max_wal_senders independent of max_connections for its handling of PGPROC entries in ProcGlobal, meaning that connection slots for WAL senders are handled using their own free queue, like autovacuum workers and bgworkers. One compatibility issue that this change creates is that a standby now requires to have a value of max_wal_senders at least equal to its primary. So, if a standby created enforces the value of max_wal_senders to be lower than that, then this could break failovers. Normally this should not be an issue though, as any settings of a standby are inherited from its primary as postgresql.conf gets normally copied as part of a base backup, so parameters would be consistent. Author: Alexander Kukushkin Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii Kliukin Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-11Redesign the partition dependency mechanism.Tom Lane
The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11Fix misleading PG_RE_THROW commentaryAlvaro Herrera
The old verbiage indicated that PG_RE_THROW is optional, which is not really true. This has confused many people, so it seems worth fixing. Discussion: https://postgr.es/m/20190206160958.GA22304@alvherre.pgsql
2019-02-09Build out the planner support function infrastructure.Tom Lane
Add support function requests for estimating the selectivity, cost, and number of result rows (if a SRF) of the target function. The lack of a way to estimate selectivity of a boolean-returning function in WHERE has been a recognized deficiency of the planner since Berkeley days. This commit finally fixes it. In addition, non-constant estimates of cost and number of output rows are now possible. We still fall back to looking at procost and prorows if the support function doesn't service the request, of course. To make concrete use of the possibility of estimating output rowcount for SRFs, this commit adds support functions for array_unnest(anyarray) and the integer variants of generate_series; the lack of plausible rowcount estimates for those, even when it's obvious to a human, has been a repeated subject of complaints. Obviously, much more could now be done in this line, but I'm mostly just trying to get the infrastructure in place. Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-02-09Create the infrastructure for planner support functions.Tom Lane
Rename/repurpose pg_proc.protransform as "prosupport". The idea is still that it names an internal function that provides knowledge to the planner about the behavior of the function it's attached to; but redesign the API specification so that it's not limited to doing just one thing, but can support an extensible set of requests. The original purpose of simplifying a function call is handled by the first request type to be invented, SupportRequestSimplify. Adjust all the existing transform functions to handle this API, and rename them fron "xxx_transform" to "xxx_support" to reflect the potential generalization of what they do. (Since we never previously provided any way for extensions to add transform functions, this change doesn't create an API break for them.) Also add DDL and pg_dump support for attaching a support function to a user-defined function. Unfortunately, DDL access has to be restricted to superusers, at least for now; but seeing that support functions will pretty much have to be written in C, that limitation is just theoretical. (This support is untested in this patch, but a follow-on patch will add cases that exercise it.) Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-02-09Refactor the representation of indexable clauses in IndexPaths.Tom Lane
In place of three separate but interrelated lists (indexclauses, indexquals, and indexqualcols), an IndexPath now has one list "indexclauses" of IndexClause nodes. This holds basically the same information as before, but in a more useful format: in particular, there is now a clear connection between an indexclause (an original restriction clause from WHERE or JOIN/ON) and the indexquals (directly usable index conditions) derived from it. We also change the ground rules a bit by mandating that clause commutation, if needed, be done up-front so that what is stored in the indexquals list is always directly usable as an index condition. This gets rid of repeated re-determination of which side of the clause is the indexkey during costing and plan generation, as well as repeated lookups of the commutator operator. To minimize the added up-front cost, the typical case of commuting a plain OpExpr is handled by a new special-purpose function commute_restrictinfo(). For RowCompareExprs, generating the new clause properly commuted to begin with is not really any more complex than before, it's just different --- and we can save doing that work twice, as the pretty-klugy original implementation did. Tracking the connection between original and derived clauses lets us also track explicitly whether the derived clauses are an exact or lossy translation of the original. This provides a cheap solution to getting rid of unnecessary rechecks of boolean index clauses, which previously seemed like it'd be more expensive than it was worth. Another pleasant (IMO) side-effect is that EXPLAIN now always shows index clauses with the indexkey on the left; this seems less confusing. This commit leaves expand_indexqual_conditions() and some related functions in a slightly messy state. I didn't bother to change them any more than minimally necessary to work with the new data structure, because all that code is going to be refactored out of existence in a follow-on patch. Discussion: https://postgr.es/m/22182.1549124950@sss.pgh.pa.us
2019-02-09Allow to reset execGrouping.c style tuple hashtables.Andres Freund
This has the advantage that the comparator expression, the table's slot, etc do not have to be rebuilt. Additionally the simplehash.h hashtable within the tuple hashtable now keeps its previous size and doesn't need to be reallocated. That both reduces allocator overhead, and improves performance in cases where the input estimation was off by a significant factor. To avoid an API/ABI break, the new parameter is exposed via the new BuildTupleHashTableExt(), and BuildTupleHashTable() now is a wrapper around the former, that continues to allocate the table itself in the tablecxt. Using this fixes performance issues discovered in the two bugs referenced. This commit however has not converted the callers, that's done in a separate commit. Bug: #15592 #15486 Reported-By: Jakub Janeček, Dmitry Marakasov Author: Andres Freund Discussion: https://postgr.es/m/15486-05850f065da42931@postgresql.org https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de Backpatch: 11, this is a prerequisite for other fixes
2019-02-09simplehash: Add support for resetting a hashtable's contents.Andres Freund
A hashtable reset just reset the hashtable entries, but does not free memory. Author: Andres Freund Discussion: https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de Bug: #15592 #15486 Backpatch: 11, this is a prerequisite for other fixes
2019-02-08Add pg_partition_root to display top-most parent of a partition treeMichael Paquier
This is useful when looking at partition trees with multiple layers, and combined with pg_partition_tree, it provides the possibility to show up an entire tree by just knowing one member at any level. Author: Michael Paquier Reviewed-by: Álvaro Herrera, Amit Langote Discussion: https://postgr.es/m/20181207014015.GP2407@paquier.xyz
2019-02-07Split create_foreignscan_path() into three functions.Tom Lane
Up to now postgres_fdw has been using create_foreignscan_path() to generate not only base-relation paths, but also paths for foreign joins and foreign upperrels. This is wrong, because create_foreignscan_path() calls get_baserel_parampathinfo() which will only do the right thing for baserels. It accidentally fails to fail for unparameterized paths, which are the only ones postgres_fdw (thought it) was handling, but we really need different APIs for the baserel and join cases. In HEAD, the best thing to do seems to be to split up the baserel, joinrel, and upperrel cases into three functions so that they can have different APIs. I haven't actually given create_foreign_join_path a different API in this commit: we should spend a bit of time thinking about just what we want to do there, since perhaps FDWs would want to do something different from the build-up-a-join-pairwise approach that get_joinrel_parampathinfo expects. In the meantime, since postgres_fdw isn't prepared to generate parameterized joins anyway, just give it a defense against trying to plan joins with lateral refs. In addition (and this is what triggered this whole mess) fix bug #15613 from Srinivasan S A, by teaching file_fdw and postgres_fdw that plain baserel foreign paths still have outer refs if the relation has lateral_relids. Add some assertions in relnode.c to catch future occurrences of the same error --- in particular, to catch other FDWs doing that, but also as backstop against core-code mistakes like the one fixed by commit bdd9a99aa. Bug #15613 also needs to be fixed in the back branches, but the appropriate fix will look quite a bit different there, since we don't want to assume that existing FDWs get the word right away. Discussion: https://postgr.es/m/15613-092be1be9576c728@postgresql.org
2019-02-07Use EXECUTE FUNCTION syntax for triggers morePeter Eisentraut
Change pg_dump and ruleutils.c to use the FUNCTION keyword instead of PROCEDURE in trigger and event trigger definitions. This completes the pieces of the transition started in 0a63f996e018ac508c858e87fa39cc254a5db49f that were kept out of PostgreSQL 11 because of the required catversion change. Discussion: https://www.postgresql.org/message-id/381bef53-f7be-29c8-d977-948e389161d6@2ndquadrant.com
2019-02-06Fix heap_getattr() handling of fast defaults.Andres Freund
Previously heap_getattr() returned NULL for attributes with a fast default value (c.f. 16828d5c0273), as it had no handling whatsoever for that case. A previous fix, 7636e5c60f, attempted to fix issues caused by this oversight, but just expanding OLD tuples for triggers doesn't actually solve the underlying issue. One known consequence of this bug is that the check for HOT updates can return the wrong result, when a previously fast-default'ed column is set to NULL. Which in turn means that an index over a column with fast default'ed columns might be corrupt if the underlying column(s) allow NULLs. Fix by handling fast default columns in heap_getattr(), remove now superfluous expansion in GetTupleForTrigger(). Author: Andres Freund Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de Backpatch: 11, where fast defaults were introduced
2019-02-04Avoid creation of the free space map for small heap relations, take 2.Amit Kapila
Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
2019-02-03Add shared_memory_type GUC.Thomas Munro
Since 9.3 we have used anonymous shared mmap for our main shared memory region, except in EXEC_BACKEND builds. Provide a GUC so that users can opt for System V shared memory once again, like in 9.2 and earlier. A later patch proposes to add huge/large page support for AIX, which requires System V shared memory and provided the motivation to revive this possibility. It may also be useful on some BSDs. Author: Andres Freund (revived and documented by Thomas Munro) Discussion: https://postgr.es/m/HE1PR0202MB28126DB4E0B6621CC6A1A91286D90%40HE1PR0202MB2812.eurprd02.prod.outlook.com Discussion: https://postgr.es/m/2AE143D2-87D3-4AD1-AC78-CE2258230C05%40FreeBSD.org
2019-02-01Renaming for new subscripting mechanismAlvaro Herrera
Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to introduce a more generic subscription mechanism, which allows subscripting not only arrays but also other object types such as JSONB. That functionality is introduced in a largish invasive patch, out of which this internal renaming patch was extracted. Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
2019-02-01Add more columns to pg_stat_sslPeter Eisentraut
Add columns client_serial and issuer_dn to pg_stat_ssl. These allow uniquely identifying the client certificate. Rename the existing column clientdn to client_dn, to make the naming more consistent and easier to read. Discussion: https://www.postgresql.org/message-id/flat/398754d8-6bb5-c5cf-e7b8-22e5f0983caf@2ndquadrant.com/
2019-01-30Allow RECORD and RECORD[] to be specified in function coldeflists.Tom Lane
We can't allow these pseudo-types to be used as table column types, because storing an anonymous record value in a table would result in data that couldn't be understood by other sessions. However, it seems like there's no harm in allowing the case in a column definition list that's specifying what a function-returning-record returns. The data involved is all local to the current session, so we should be just as able to resolve its actual tuple type as we are for the function-returning-record's top-level tuple output. Elvis Pranskevichus, with cosmetic changes by me Discussion: https://postgr.es/m/11038447.kQ5A9Uj5xi@hammer.magicstack.net
2019-01-29Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane
The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
2019-01-29Refactor planner's header files.Tom Lane
Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29Make some small planner API cleanups.Tom Lane
Move a few very simple node-creation and node-type-testing functions from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs. There's nothing planner-specific about them, as evidenced by the number of other places that were using them. While at it, rename and_clause() etc to is_andclause() etc, to clarify that they are node-type-testing functions not node-creation functions. And use "static inline" implementations for the shortest ones. Also, modify flatten_join_alias_vars() and some subsidiary functions to take a Query not a PlannerInfo to define the join structure that Vars should be translated according to. They were only using the "parse" field of the PlannerInfo anyway, so this just requires removing one level of indirection. The advantage is that now parse_agg.c can use flatten_join_alias_vars() without the horrid kluge of creating an incomplete PlannerInfo, which will allow that file to be decoupled from relation.h in a subsequent patch. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us