summaryrefslogtreecommitdiff
path: root/src/include/executor
AgeCommit message (Collapse)Author
2018-09-15Fix failure with initplans used conditionally during EvalPlanQual rechecks.Tom Lane
The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit 9f2ee8f28, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
2018-09-07Save/restore SPI's global variables in SPI_connect() and SPI_finish().Tom Lane
This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
2018-03-17Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan.Tom Lane
"UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message like "cannot extract system attribute from virtual tuple", if the cursor was using a index-only scan for the target table. Fix it by digging the current TID out of the indexscan state. It seems likely that the same failure could occur for CustomScan plans and perhaps some FDW plan types, so that leaving this to be treated as an internal error with an obscure message isn't as good an idea as it first seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver a more on-topic message. I chose to make the message match what you get for the case where execCurrentOf can't identify the target scan node at all, "cursor "foo" is not a simply updatable scan of table "bar"". Perhaps it should be different, but we can always adjust that later. In the future, it might be nice to provide hooks that would let custom scan providers and/or FDWs deal with this in other ways; but that's not a suitable topic for a back-patchable bug fix. It's been like this all along, so back-patch to all supported branches. Yugo Nagata and Tom Lane Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp
2017-10-06Fix crash when logical decoding is invoked from a PL function.Tom Lane
The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit 7ec1c5a86. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
2017-04-15Avoid passing function pointers across process boundaries.Tom Lane
This back-patches commit 32470825d36d99a81347ee36c181d609c952c061 into 9.6, primarily to make buildfarm member culicidae happy. Unlike the HEAD patch, avoid changing the existing API of CreateParallelContext; instead we just switch to using CreateParallelContextForExternalFunction, even for core functions. Petr Jelinek, with a bunch of basically-cosmetic adjustments by me Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
2016-07-29Fix worst memory leaks in tqueue.c.Tom Lane
TupleQueueReaderNext() leaks like a sieve if it has to do any tuple disassembly/reconstruction. While we could try to clean up its allocations piecemeal, it seems like a better idea just to insist that it should be run in a short-lived memory context, so that any transient space goes away automatically. I chose to have nodeGather.c switch into its existing per-tuple context before the call, rather than inventing a separate context inside tqueue.c. This is sufficient to stop all leakage in the simple case I exhibited earlier today (see link below), but it does not deal with leaks induced in more complex cases by tqueue.c's insistence on using TopMemoryContext for data that it's not actually trying hard to keep track of. That issue is intertwined with another major source of inefficiency, namely failure to cache lookup results across calls, so it seems best to deal with it separately. In passing, improve some comments, and modify gather_readnext's method for deciding when it's visited all the readers so that it's more obviously correct. (I'm not actually convinced that the previous code *is* correct in the case of a reader deletion; it certainly seems fragile.) Discussion: <32763.1469821037@sss.pgh.pa.us>
2016-06-09pgindent run for 9.6Robert Haas
2016-06-03Mark read/write expanded values as read-only in ExecProject().Tom Lane
If a plan node output expression returns an "expanded" datum, and that output column is referenced in more than one place in upper-level plan nodes, we need to ensure that what is returned is a read-only reference not a read/write reference. Otherwise one of the referencing sites could scribble on or even delete the expanded datum before we have evaluated the others. Commit 1dc5ebc9077ab742, which introduced this feature, supposed that it'd be sufficient to make SubqueryScan nodes force their output columns to read-only state. The folly of that was revealed by bug #14174 from Andrew Gierth, and really should have been immediately obvious considering that the planner will happily optimize SubqueryScan nodes out of the plan without any regard for this issue. The safest fix seems to be to make ExecProject() force its results into read-only state; that will cover every case where a plan node returns expression results. Actually we can delegate this to ExecTargetList() since we can recursively assume that plain Vars will not reference read-write datums. That should keep the extra overhead down to something minimal. We no longer need ExecMakeSlotContentsReadOnly(), which was introduced only in support of the idea that just a few plan node types would need to do this. In the future it would be nice to have the planner account for this problem and inject force-to-read-only expression evaluation nodes into only the places where there's a risk of multiple evaluation. That's not a suitable solution for 9.5 or even 9.6 at this point, though. Report: <20160603124628.9932.41279@wrigleys.postgresql.org>
2016-05-23Fix latent crash in do_text_output_multiline().Tom Lane
do_text_output_multiline() would fail (typically with a null pointer dereference crash) if its input string did not end with a newline. Such cases do not arise in our current sources; but it certainly could happen in future, or in extension code's usage of the function, so we should fix it. To fix, replace "eol += len" with "eol = text + len". While at it, make two cosmetic improvements: mark the input string const, and rename the argument from "text" to "txt" to dodge pgindent strangeness (since "text" is a typedef name). Even though this problem is only latent at present, it seems like a good idea to back-patch the fix, since it's a very simple/safe patch and it's not out of the realm of possibility that we might in future back-patch something that expects sane behavior from do_text_output_multiline(). Per report from Hao Lee. Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>
2016-03-29Rework custom scans to work more like the new extensible node stuff.Robert Haas
Per discussion, the new extensible node framework is thought to be better designed than the custom path/scan/scanstate stuff we added in PostgreSQL 9.5. Rework the latter to be more like the former. This is not backward-compatible, but we generally don't promise that for C APIs, and there probably aren't many people using this yet anyway. KaiGai Kohei, reviewed by Petr Jelinek and me. Some further cosmetic changes by me.
2016-03-12Widen query numbers-of-tuples-processed counters to uint64.Tom Lane
This patch widens SPI_processed, EState's es_processed field, PortalData's portalPos field, FuncCallContext's call_cntr and max_calls fields, ExecutorRun's count argument, PortalRunFetch's result, and the max number of rows in a SPITupleTable to uint64, and deals with (I hope) all the ensuing fallout. Some of these values were declared uint32 before, and others "long". I also removed PortalData's posOverflow field, since that logic seems pretty useless given that portalPos is now always 64 bits. The user-visible results are that command tags for SELECT etc will correctly report tuple counts larger than 4G, as will plpgsql's GET GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples than that are still not exactly the norm, but they're becoming more common. Most values associated with FETCH/MOVE distances, such as PortalRun's count argument and the count argument of most SPI functions that have one, remain declared as "long". It's not clear whether it would be worth promoting those to int64; but it would definitely be a large dollop of additional API churn on top of this, and it would only help 32-bit platforms which seem relatively less likely to see any benefit. Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-02-03Allow parallel custom and foreign scans.Robert Haas
This patch doesn't put the new infrastructure to use anywhere, and indeed it's not clear how it could ever be used for something like postgres_fdw which has to send an SQL query and wait for a reply, but there might be FDWs or custom scan providers that are CPU-bound, so let's give them a way to join club parallel. KaiGai Kohei, reviewed by me.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-12-09Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.Robert Haas
The original parallel sequential scan commit included only very limited changes to the EXPLAIN output. Aggregated totals from all workers were displayed, but there was no way to see what each individual worker did or to distinguish the effort made by the workers from the effort made by the leader. Per a gripe by Thom Brown (and maybe others). Patch by me, reviewed by Amit Kapila.
2015-11-18Avoid aggregating worker instrumentation multiple times.Robert Haas
Amit Kapila, per design ideas from me.
2015-11-11Make sequential scans parallel-aware.Robert Haas
In addition, this path fills in a number of missing bits and pieces in the parallel infrastructure. Paths and plans now have a parallel_aware flag indicating whether whatever parallel-aware logic they have should be engaged. It is believed that we will need this flag for a number of path/plan types, not just sequential scans, which is why the flag is generic rather than part of the SeqScan structures specifically. Also, execParallel.c now gives parallel nodes a chance to initialize their PlanState nodes from the DSM during parallel worker startup. Amit Kapila, with a fair amount of adjustment by me. Review of previous patch versions by Haribabu Kommi and others.
2015-11-06Modify tqueue infrastructure to support transient record types.Robert Haas
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this mechanism, failed to account for the fact that the RECORD pseudo-type uses transient typmods that are only meaningful within a single backend. Transferring such tuples without modification between two cooperating backends does not work. This commit installs a system for passing the tuple descriptors over the same shm_mq being used to send the tuples themselves. The two sides might not assign the same transient typmod to any given tuple descriptor, so we must also substitute the appropriate receiver-side typmod for the one used by the sender. That adds some CPU overhead, but still seems better than being unable to pass records between cooperating parallel processes. Along the way, move the logic for handling multiple tuple queues from tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader, which reads from a single queue, rather than a TupleQueueFunnel, which potentially reads from multiple queues. This change was suggested previously as a way to make sure that nodeGather.c rather than tqueue.c had policy control over the order in which to read from queues, but it wasn't clear to me until now how good an idea it was. typmod mapping needs to be performed separately for each queue, and it is much simpler if the tqueue.c code handles that and leaves multiplexing multiple queues to higher layers of the stack.
2015-10-30Update parallel executor support to reuse the same DSM.Robert Haas
Commit b0b0d84b3d663a148022e900ebfc164284a95f55 purported to make it possible to relaunch workers using the same parallel context, but it had an unpleasant race condition: we might reinitialize after the workers have sent their last control message but before they have dettached the DSM, leaving to crashes. Repair by introducing a new ParallelContext operation, ReinitializeParallelDSM. Adjust execParallel.c to use this new support, so that we can rescan a Gather node by relaunching workers but without needing to recreate the DSM. Amit Kapila, with some adjustments by me. Extracted from latest parallel sequential scan patch.
2015-10-16Rewrite interaction of parallel mode with parallel executor support.Robert Haas
In the previous coding, before returning from ExecutorRun, we'd shut down all parallel workers. This was dead wrong if ExecutorRun was called with a non-zero tuple count; it had the effect of truncating the query output. To fix, give ExecutePlan control over whether to enter parallel mode, and have it refuse to do so if the tuple count is non-zero. Rewrite the Gather logic so that it can cope with being called outside parallel mode. Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b is largely to blame for this problem, though this patch modifies some subsequently-committed code which relied on the guarantees it purported to make.
2015-10-04Fix some issues in new hashtable size calculations in nodeHash.c.Tom Lane
Limit the size of the hashtable pointer array to not more than MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory alloc request size" failures. There was discussion of allowing the array to get larger than that by using the "huge" palloc API, but so far no proof that that is actually a good idea, and at this point in the 9.5 cycle major changes from old behavior don't seem like the way to go. Fix a rather serious secondary bug in the new code, which was that it didn't ensure nbuckets remained a power of 2 when recomputing it for the multiple-batch case. Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and its sole call site.
2015-09-30Add a Gather executor node.Robert Haas
A Gather executor node runs any number of copies of a plan in an equal number of workers and merges all of the results into a single tuple stream. It can also run the plan itself, if the workers are unavailable or haven't started up yet. It is intended to work with the Partial Seq Scan node which will be added in future commits. It could also be used to implement parallel query of a different sort by itself, without help from Partial Seq Scan, if the single_copy mode is used. In that mode, a worker executes the plan, and the parallel leader does not, merely collecting the worker's results. So, a Gather node could be inserted into a plan to split the execution of that plan across two processes. Nested Gather nodes aren't currently supported, but we might want to add support for that in the future. There's nothing in the planner to actually generate Gather nodes yet, so it's not quite time to break out the champagne. But we're getting close. Amit Kapila. Some designs suggestions were provided by me, and I also reviewed the patch. Single-copy mode, documentation, and other minor changes also by me.
2015-09-28Parallel executor support.Robert Haas
This code provides infrastructure for a parallel leader to start up parallel workers to execute subtrees of the plan tree being executed in the master. User-supplied parameters from ParamListInfo are passed down, but PARAM_EXEC parameters are not. Various other constructs, such as initplans, subplans, and CTEs, are also not currently shared. Nevertheless, there's enough here to support a basic implementation of parallel query, and we can lift some of the current restrictions as needed. Amit Kapila and Robert Haas
2015-09-18Glue layer to connect the executor to the shm_mq mechanism.Robert Haas
The shm_mq mechanism was built to send error (and notice) messages and tuples between backends. However, shm_mq itself only deals in raw bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have had infrastructure for one message to redirect protocol messages to a queue and for another backend to parse them and do useful things with them. This commit introduces a somewhat analogous facility for tuples by adding a new type of DestReceiver, DestTupleQueue, which writes each tuple generated by a query into a shm_mq, and a new TupleQueueFunnel facility which reads raw tuples out of the queue and reconstructs the HeapTuple format expected by the executor. The TupleQueueFunnel abstraction supports reading from multiple tuple streams at the same time, but only in round-robin fashion. Someone could imaginably want other policies, but this should be good enough to meet our short-term needs related to parallel query, and we can always extend it later. This also makes one minor addition to the shm_mq API that didn' seem worth breaking out as a separate patch. Extracted from Amit Kapila's parallel sequential scan patch. This code was originally written by me, and then it was revised by Amit, and then it was revised some more by me.
2015-08-21Remove ExecGetScanType functionAlvaro Herrera
This became unused in a191a169d6d0b9558da4519e66510c4540204a51.
2015-07-25Redesign tablesample method API, and do extensive code review.Tom Lane
The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
2015-05-23pgindent run for 9.5Bruce Momjian
2015-05-15TABLESAMPLE, SQL Standard and extensibleSimon Riggs
Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
2015-05-14Support "expanded" objects, particularly arrays, for better performance.Tom Lane
This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
2015-05-12Add support for doing late row locking in FDWs.Tom Lane
Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-10Code review for foreign/custom join pushdown patch.Tom Lane
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.
2015-05-08Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.Andres Freund
The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
2015-04-24Perform RLS WITH CHECK before constraints, etcStephen Frost
The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.
2015-04-24Move functions related to index maintenance to separate source file.Heikki Linnakangas
There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.
2015-04-02Define integer limits independently from the system definitions.Andres Freund
In 83ff1618 we defined integer limits iff they're not provided by the system. That turns out not to be the greatest idea because there's different ways some datatypes can be represented. E.g. on OSX PG's 64bit datatype will be a 'long int', but OSX unconditionally uses 'long long'. That disparity then can lead to warnings, e.g. around printf formats. One way to fix that would be to back int64 using stdint.h's int64_t. While a good idea it's not that easy to implement. We would e.g. need to include stdint.h in our external headers, which we don't today. Also computing the correct int64 printf formats in that case is nontrivial. Instead simply prefix the integer limits with PG_ and define them unconditionally. I've adjusted all the references to them in code, but not the ones in comments; the latter seems unnecessary to me. Discussion: 20150331141423.GK4878@alap3.anarazel.de
2015-03-25Centralize definition of integer limits.Andres Freund
Several submitted and even committed patches have run into the problem that C89, our baseline, does not provide minimum/maximum values for various integer datatypes. C99's stdint.h does, but we can't rely on it. Several parts of the code defined limits locally, so instead centralize the definitions to c.h. This patch also changes the more obvious usages of literal limit values; there's more places that could be changed, but it's less clear whether it's beneficial to change those. Author: Andrew Gierth Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
2015-03-15Move LockClauseStrength, LockWaitPolicy into new file nodes/lockoptions.h.Tom Lane
Commit df630b0dd5ea2de52972d456f5978a012436115e moved enum LockWaitPolicy into its very own header file utils/lockwaitpolicy.h, which does not seem like a great idea from here. First, it's still a node-related declaration, and second, a file named like that can never sensibly be used for anything else. I do not think we want to encourage a one-typedef-per-header-file approach. The upcoming foreign table inheritance patch was doubling down on this bad idea by moving enum LockClauseStrength into its *own* can-never-be-used-for-anything-else file. Instead, let's put them both in a file named nodes/lockoptions.h. (They do seem to need a separate header file because we need them in both parsenodes.h and plannodes.h, and we don't want either of those including the other. Past practice might suggest adding them to nodes/nodes.h, but they don't seem sufficiently globally useful to justify that.) Committed separately since there's no functional change here, just some header-file refactoring.
2015-02-20Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.Tom Lane
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-11-21Rearrange CustomScan API.Tom Lane
Make it work more like FDW plans do: instead of assuming that there are expressions in a CustomScan plan node that the core code doesn't know about, insist that all subexpressions that need planner attention be in a "custom_exprs" list in the Plan representation. (Of course, the custom plugin can break the list apart again at executor initialization.) This lets us revert the parts of the patch that exposed setrefs.c and subselect.c processing to the outside world. Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept may work in future, but it's far from fully baked right now.
2014-11-20Remove dead code supporting mark/restore in SeqScan, TidScan, ValuesScan.Tom Lane
There seems no prospect that any of this will ever be useful, and indeed it's questionable whether some of it would work if it ever got called; it's certainly not been exercised in a very long time, if ever. So let's get rid of it, and make the comments about mark/restore in execAmi.c less wishy-washy. The mark/restore support for Result nodes is also currently dead code, but that's due to planner limitations not because it's impossible that it could be useful. So I left it in.
2014-11-20Initial code review for CustomScan patch.Tom Lane
Get rid of the pernicious entanglement between planner and executor headers introduced by commit 0b03e5951bf0a1a8868db13f02049cf686a82165. Also, rearrange the CustomFoo struct/typedef definitions so that all the typedef names are seen as used by the compiler. Without this pgindent will mess things up a bit, which is not so important perhaps, but it also removes a bizarre discrepancy between the declaration arrangement used for CustomExecMethods and that used for CustomScanMethods and CustomPathMethods. Clean up the commentary around ExecSupportsMarkRestore to reflect the rather large change in its API. Const-ify register_custom_path_provider's argument. This necessitates casting away const in the function, but that seems better than forcing callers of the function to do so (or else not const-ify their method pointer structs, which was sort of the whole point). De-export fix_expr_common. I don't like the exporting of fix_scan_expr or replace_nestloop_params either, but this one surely has got little excuse.
2014-11-10Ensure that RowExprs and whole-row Vars produce the expected column names.Tom Lane
At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-07Introduce custom path and scan providers.Robert Haas
This allows extension modules to define their own methods for scanning a relation, and get the core code to use them. It's unclear as yet how much use this capability will find, but we won't find out if we never commit it. KaiGai Kohei, reviewed at various times and in various levels of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro Herrera, and myself.
2014-10-13Increase number of hash join buckets for underestimate.Kevin Grittner
If we expect batching at the very beginning, we size nbuckets for "full work_mem" (see how many tuples we can get into work_mem, while not breaking NTUP_PER_BUCKET threshold). If we expect to be fine without batching, we start with the 'right' nbuckets and track the optimal nbuckets as we go (without actually resizing the hash table). Once we hit work_mem (considering the optimal nbuckets value), we keep the value. At the end of the first batch, we check whether (nbuckets != nbuckets_optimal) and resize the hash table if needed. Also, we keep this value for all batches (it's OK because it assumes full work_mem, and it makes the batchno evaluation trivial). So the resize happens only once. There could be cases where it would improve performance to allow the NTUP_PER_BUCKET threshold to be exceeded to keep everything in one batch rather than spilling to a second batch, but attempts to generate such a case have so far been unsuccessful; that issue may be addressed with a follow-on patch after further investigation. Tomas Vondra with minor format and comment cleanup by me Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
2014-10-07Implement SKIP LOCKED for row-level locksAlvaro Herrera
This clause changes the behavior of SELECT locking clauses in the presence of locked rows: instead of causing a process to block waiting for the locks held by other processes (or raise an error, with NOWAIT), SKIP LOCKED makes the new reader skip over such rows. While this is not appropriate behavior for general purposes, there are some cases in which it is useful, such as queue-like tables. Catalog version bumped because this patch changes the representation of stored rules. Reviewed by Craig Ringer (based on a previous attempt at an implementation by Simon Riggs, who also provided input on the syntax used in the current patch), David Rowley, and Álvaro Herrera. Author: Thomas Munro
2014-09-10Pack tuples in a hash join batch densely, to save memory.Heikki Linnakangas
Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks and pack the tuples densely in the chunks. This avoids the AllocChunk header overhead, and the space wasted by standard allocator's habit of rounding sizes up to the nearest power of two. This doesn't contain any planner changes, because the planner's estimate of memory usage ignores the palloc overhead. Now that the overhead is smaller, the planner's estimates are in fact more accurate. Tomas Vondra, reviewed by Robert Haas.
2014-08-27Fix FOR UPDATE NOWAIT on updated tuple chainsAlvaro Herrera
If SELECT FOR UPDATE NOWAIT tries to lock a tuple that is concurrently being updated, it might fail to honor its NOWAIT specification and block instead of raising an error. Fix by adding a no-wait flag to EvalPlanQualFetch which it can pass down to heap_lock_tuple; also use it in EvalPlanQualFetch itself to avoid blocking while waiting for a concurrent transaction. Authors: Craig Ringer and Thomas Munro, tweaked by Álvaro http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com Per Thomas Munro in the course of his SKIP LOCKED feature submission, who also provided one of the isolation test specs. Backpatch to 9.4, because that's as far back as it applies without conflicts (although the bug goes all the way back). To that branch also backpatch Thomas Munro's new NOWAIT test cases, committed in master by Heikki as commit 9ee16b49f0aac819bd4823d9b94485ef608b34e8 .
2014-06-19Avoid leaking memory while evaluating arguments for a table function.Tom Lane
ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM in the query-lifespan memory context. This is insignificant in simple cases where the function relation is scanned only once; but if the function is in a sub-SELECT or is on the inside of a nested loop, any memory consumed during argument evaluation can add up quickly. (The potential for trouble here had been foreseen long ago, per existing comments; but we'd not previously seen a complaint from the field about it.) To fix, create an additional temporary context just for this purpose. Per an example from MauMau. Back-patch to all active branches.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.