summaryrefslogtreecommitdiff
path: root/src/backend/commands
AgeCommit message (Collapse)Author
2018-03-31Fix bug with view locking code.Tatsuo Ishii
LockViewRecurese() obtains view relation using heap_open() and passes it to get_view_query() to get view info. It immediately closes the relation then uses the returned view info by calling LockViewRecurse_walker(). Since get_view_query() returns a pointer within the relcache, the relcache should be kept until LockViewRecurse_walker() returns. Otherwise the relation could point to a garbage memory area. Fix is moving the heap_close() call after LockViewRecurse_walker(). Problem reported by Tom Lane (buildfarm is unhappy, especially prion since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
2018-03-30Combine options for RangeVarGetRelidExtended() into a flags argument.Andres Freund
A followup patch will add a SKIP_LOCKED option. To avoid introducing evermore arguments, breaking existing callers each time, introduce a flags argument. This'll no doubt break a few external users... Also change the MISSING_OK behaviour so a DEBUG1 debug message is emitted when a relation is not found. Author: Nathan Bossart Reviewed-By: Michael Paquier and Andres Freund Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
2018-03-30Allow to lock views.Tatsuo Ishii
Now all tables used in view definitions can be recursively locked by a LOCK command. Author: Yugo Nagata Reviewed by Robert Haas, Thomas Munro and me. Discussion: https://postgr.es/m/20171011183629.eb2817b3.nagata%40sraoss.co.jp
2018-03-29C comments: "a" <--> "an" correctionsBruce Momjian
Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me
2018-03-29While vacuuming a large table, update upper-level FSM data every so often.Tom Lane
VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-28Add EXPLAIN support for JIT.Andres Freund
This just shows a few details about JITing, e.g. how many functions have been JITed, and how long that took. To avoid noise in regression tests with functions sometimes being JITed in --with-llvm builds, disable display when COSTS OFF is specified. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28Fast ALTER TABLE ADD COLUMN with a non-NULL defaultAndrew Dunstan
Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
2018-03-27Allow memory contexts to have both fixed and variable ident strings.Tom Lane
Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit 9fa6f00b1 rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit 9fa6f00b1. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-23Allow FOR EACH ROW triggers on partitioned tablesAlvaro Herrera
Previously, FOR EACH ROW triggers were not allowed in partitioned tables. Now we allow AFTER triggers on them, and on trigger creation we cascade to create an identical trigger in each partition. We also clone the triggers to each partition that is created or attached later. This means that deferred unique keys are allowed on partitioned tables, too. Author: Álvaro Herrera Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas, Thomas Munro Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql
2018-03-22Sync up our various ways of estimating pg_class.reltuples.Tom Lane
VACUUM thought that reltuples represents the total number of tuples in the relation, while ANALYZE counted only live tuples. This can cause "flapping" in the value when background vacuums and analyzes happen separately. The planner's use of reltuples essentially assumes that it's the count of live (visible) tuples, so let's standardize on having it mean live tuples. Another issue is that the definition of "live tuple" isn't totally clear; what should be done with INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples? ANALYZE's choices in this regard are made on the assumption that if the originating transaction commits at all, it will happen after ANALYZE finishes, so we should ignore the effects of the in-progress transaction --- unless it is our own transaction, and then we should count it. Let's propagate this definition into VACUUM, too. Likewise propagate this definition into CREATE INDEX, and into contrib/pgstattuple's pgstattuple_approx() function. Tomas Vondra, reviewed by Haribabu Kommi, some corrections by me Discussion: https://postgr.es/m/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com
2018-03-22Improve ANALYZE's strategy for finding MCVs.Dean Rasheed
Previously, a value was included in the MCV list if its frequency was 25% larger than the estimated average frequency of all nonnull values in the table. For uniform distributions, that can lead to values being included in the MCV list and significantly overestimated on the basis of relatively few (sometimes just 2) instances being seen in the sample. For non-uniform distributions, it can lead to too few values being included in the MCV list, since the overall average frequency may be dominated by a small number of very common values, while the remaining values may still have a large spread of frequencies, causing both substantial overestimation and underestimation of the remaining values. Furthermore, increasing the statistics target may have little effect because the overall average frequency will remain relatively unchanged. Instead, populate the MCV list with the largest set of common values that are statistically significantly more common than the average frequency of the remaining values. This takes into account the variance of the sample counts, which depends on the counts themselves and on the proportion of the table that was sampled. As a result, it constrains the relative standard error of estimates based on the frequencies of values in the list, reducing the chances of too many values being included. At the same time, it allows more values to be included, since the MCVs need only be more common than the remaining non-MCVs, rather than the overall average. Thus it tends to produce fewer MCVs than the previous code for uniform distributions, and more for non-uniform distributions, reducing estimation errors in both cases. In addition, the algorithm responds better to increasing the statistics target, allowing more values to be included in the MCV list when more of the table is sampled. Jeff Janes, substantially modified by me. Reviewed by John Naylor and Tomas Vondra. Discussion: https://postgr.es/m/CAMkU=1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO+a=4DVkzpuQ2tw@mail.gmail.com
2018-03-21Fix relcache handling of the 'default' partitionAlvaro Herrera
My commit 4dba331cb3dc that moved around CommandCounterIncrement calls in partitioning DDL code unearthed a problem with the relcache handling for the 'default' partition: the construction of a correct relcache entry for the partitioned table was at the mercy of lack of CCI calls in non-trivial amounts of code. This was prone to creating problems later on, as the code develops. This was visible as a test failure in a compile with RELCACHE_FORCE_RELASE (buildfarm member prion). The problem is that after the mentioned commit it was possible to create a relcache entry that had incomplete information regarding the default partition because I introduced a CCI between adding the catalog entries for the default partition (StorePartitionBound) and the update of pg_partitioned_table entry for its parent partitioned table (update_default_partition_oid). It seems the best fix is to move the latter so that it occurs inside the former; the purposeful lack of intervening CCI should be more obvious, and harder to break. I also remove a check in RelationBuildPartitionDesc that returns NULL if the key is not set. I couldn't find any place that needs this hack anymore; probably it was required because of bugs that have since been fixed. Fix a few typos I noticed while reviewing the code involved. Discussion: https://postgr.es/m/20180320182659.nyzn3vqtjbbtfgwq@alvherre.pgsql
2018-03-21Handle heap rewrites even better in logical decodingPeter Eisentraut
Logical decoding should not publish anything about tables created as part of a heap rewrite during DDL. Those tables don't exist externally, so consumers of logical decoding cannot do anything sensible with that information. In ab28feae2bd3d4629bd73ae3548e671c57d785f0, we worked around this for built-in logical replication, but that was hack. This is a more proper fix: We mark such transient heaps using the new field pg_class.relwrite, linking to the original relation OID. By default, we ignore them in logical decoding before they get to the output plugin. Optionally, a plugin can register their interest in getting such changes, if they handle DDL specially, in which case the new field will help them get information about the actual table. Reviewed-by: Craig Ringer <craig@2ndquadrant.com>
2018-03-20Fix CommandCounterIncrement in partition-related DDLAlvaro Herrera
It makes sense to do the CCIs in the places that do catalog updates, rather than before the places that error out because the former ones fail to do it. In particular, it looks like StorePartitionBound() and IndexSetParentIndex() ought to make their own CCIs. Per review comments from Peter Eisentraut for row-level triggers on partitioned tables. Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql
2018-03-19Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.Tom Lane
refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and #5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
2018-03-19Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY.Tom Lane
Jeff Janes discovered that commit 7ca25b7de made one of the queries run by REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly. The root cause is bad cardinality estimation for correlated quals, but a principled solution to that problem is some way off, especially since the planner lacks any statistics about whole-row variables. Moreover, in non-error cases this query produces no rows, meaning it must be run to completion; but use of LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan, exactly not what we want. Remove the LIMIT clause, and instead rely on the count parameter we pass to SPI_execute() to prevent excess work if the query does return some rows. While we've heard no field reports of planner misbehavior with this query, it could be that people are having performance issues that haven't reached the level of pain needed to cause a bug report. In any case, that LIMIT clause can't possibly do anything helpful with any existing version of the planner, and it demonstrably can cause bad choices in some cases, so back-patch to 9.4 where the code was introduced. Thomas Munro Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com
2018-03-19Fix state reversal after partition tuple routingAlvaro Herrera
We make some changes to ModifyTableState and the EState it uses whenever we route tuples to partitions; but we weren't restoring properly in all cases, possibly causing crashes when partitions with different tuple descriptors are targeted by tuples inserted in the same command. Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the needed state changing logic, and have it invoked one level above its current place (ie. put it in ExecModifyTable instead of ExecInsert); this makes it all more readable. Add a test case to exercise this. We don't support having views as partitions; and since only views can have INSTEAD OF triggers, there is no point in testing for INSTEAD OF when processing insertions into a partitioned table. Remove code that appears to support this (but which is actually never relevant.) In passing, fix location of some very confusing comments in ModifyTableState. Reported-by: Amit Langote Author: Etsuro Fujita, Amit Langote Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp
2018-03-16Add 'unit' parameter to ExplainProperty{Integer,Float}.Andres Freund
This allows to deduplicate some existing code, but mainly avoids some duplication in upcoming commits. In passing, fix variable names indicating wrong unit (seconds instead of ms). Author: Andres Freund Discussion: https://postgr.es/m/20180314002740.cah3mdsonz5mxney@alap3.anarazel.de
2018-03-16Make ExplainPropertyInteger accept 64bit input, remove *Long variant.Andres Freund
'long' is not useful type across platforms, as it's 32bit on 32 bit platforms, and even on some 64bit platforms (e.g. windows) it's still only 32bits wide. As ExplainPropertyInteger should never be performance critical, change it to accept a 64bit argument and remove ExplainPropertyLong. Author: Andres Freund Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de
2018-03-16Rename TransactionChain functionsPeter Eisentraut
We call this thing a "transaction block" everywhere except in a few functions, where it is mysteriously called a "transaction chain". In the SQL standard, a transaction chain is something different. So rename these functions to match the common terminology. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-03-15Fix more format truncation issuesPeter Eisentraut
Fix the warnings created by the compiler warning options -Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This is a more aggressive variant of the fixes in 6275f5d28a1577563f53f2171689d4f890a46881, which GCC 7 warned about by default. The issues are all harmless, but some dubious coding patterns are cleaned up. One issue that is of external interest is that BGW_MAXLEN is increased from 64 to 96. Apparently, the old value would cause the bgw_name of logical replication workers to be truncated in some circumstances. But this doesn't actually add those warning options. It appears that the warnings depend a bit on compilation and optimization options, so it would be annoying to have to keep up with that. This is more of a once-in-a-while cleanup. Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-14Remove pg_class.relhaspkeyPeter Eisentraut
It is not used for anything internally, and it cannot be relied on for external uses, so it can just be removed. To correct recommended way to check for a primary key is in pg_index. Discussion: https://www.postgresql.org/message-id/flat/b1a24c6c-6913-f89c-674e-0704f0ed69db@2ndquadrant.com
2018-03-14Support INOUT arguments in proceduresPeter Eisentraut
In a top-level CALL, the values of INOUT arguments will be returned as a result row. In PL/pgSQL, the values are assigned back to the input arguments. In other languages, the same convention as for return a record from a function is used. That does not require any code changes in the PL implementations. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2018-03-13When updating reltuples after ANALYZE, just extrapolate from our sample.Tom Lane
The existing logic for updating pg_class.reltuples trusted the sampling results only for the pages ANALYZE actually visited, preferring to believe the previous tuple density estimate for all the unvisited pages. While there's some rationale for doing that for VACUUM (first that VACUUM is likely to visit a very nonrandom subset of pages, and second that we know for sure that the unvisited pages did not change), there's no such rationale for ANALYZE: by assumption, it's looked at an unbiased random sample of the table's pages. Furthermore, in a very large table ANALYZE will have examined only a tiny fraction of the table's pages, meaning it cannot slew the overall density estimate very far at all. In a table that is physically growing, this causes reltuples to increase nearly proportionally to the change in relpages, regardless of what is actually happening in the table. This has been observed to cause reltuples to become so much larger than reality that it effectively shuts off autovacuum, whose threshold for doing anything is a fraction of reltuples. (Getting to the point where that would happen seems to require some additional, not well understood, conditions. But it's undeniable that if reltuples is seriously off in a large table, ANALYZE alone will not fix it in any reasonable number of iterations, especially not if the table is continuing to grow.) Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone, and in ANALYZE, just extrapolate from the sample pages on the assumption that they provide an accurate model of the whole table. If, by very bad luck, they don't, at least another ANALYZE will fix it; in the old logic a single bad estimate could cause problems indefinitely. In HEAD, let's remove vac_estimate_reltuples' is_analyze argument altogether; it was never used for anything and now it's totally pointless. But keep it in the back branches, in case any third-party code is calling this function. Per bug #15005. Back-patch to all supported branches. David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels
2018-03-13Fix CREATE TABLE / LIKE with bigint identity columnPeter Eisentraut
CREATE TABLE / LIKE with a bigint identity column would fail on platforms where long is 32 bits. Copying the sequence values used makeInteger(), which would truncate the 64-bit sequence data to 32 bits. To fix, use makeFloat() instead, like the parser. (This does not actually make use of floats, but stores the values as strings.) Bug: #15096 Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-12Avoid having two PKs in a partitionAlvaro Herrera
If a table containing a primary key is attach as partition to a partitioned table which has a primary key with a different definition, we would happily create a second one in the new partition. Oops. It turns out that this is because an error check in DefineIndex is executed only if you tell it that it's being run by ALTER TABLE, and the original code here wasn't. Change it so that it does. Added a couple of test cases for this, also. A previously working test started to fail in a different way than before patch because the new check is called earlier; change the PK to plain UNIQUE so that the new behavior isn't invoked, so that the test continues to verify what we want it to verify. Reported by: Noriyoshi Shinoda Discussion: https://postgr.es/m/DF4PR8401MB102060EC2615EC9227CC73F7EEDF0@DF4PR8401MB1020.NAMPRD84.PROD.OUTLOOK.COM
2018-03-11Fix improper uses of canonicalize_qual().Tom Lane
One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit d007a9505 caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit f0e44751d added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
2018-03-09Improve predtest.c's internal docs, and enhance its functionality a bit.Tom Lane
Commit b08df9cab left things rather poorly documented as far as the exact semantics of "clause_is_check" mode went. Also, that mode did not really work correctly for predicate_refuted_by; although given the lack of specification as to what it should do, as well as the lack of any actual use-case, that's perhaps not surprising. Rename "clause_is_check" to "weak" proof mode, and provide specifications for what it should do. I defined weak refutation as meaning "truth of A implies non-truth of B", which makes it possible to use the mode in the part of relation_excluded_by_constraints that checks for mutually contradictory WHERE clauses. Fix up several places that did things wrong for that definition. (As far as I can see, these errors would only lead to failure-to-prove, not incorrect claims of proof, making them not serious bugs even aside from the fact that v10 contains no use of this mode. So there seems no need for back-patching.) In addition, teach predicate_refuted_by_recurse that it can use predicate_implied_by_recurse after all when processing a strong NOT-clause, so long as it asks for the correct proof strength. This is an optimization that could have been included in commit b08df9cab, but wasn't. Also, simplify and generalize the logic that checks for whether nullness of the argument of IS [NOT] NULL would force overall nullness of the predicate or clause. (This results in a change in the partition_prune test's output, as it is now able to prune an all-nulls partition that it did not recognize before.) In passing, in PartConstraintImpliedByRelConstraint, remove bogus conversion of the constraint list to explicit-AND form and then right back again; that accomplished nothing except forcing a useless extra level of recursion inside predicate_implied_by. Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us
2018-03-06Fix bogus Name assignment in CreateStatisticsAlvaro Herrera
Apparently, it doesn't work to use a plain cstring as a Name datum: you may end up having random bytes because of failing to zero the bytes after the terminating \0, as indicated by valgrind. I introduced this bug in 5564c1181548, so backpatch this fix to REL_10_STABLE, like that commit. While at it, fix a slightly misleading comment, pointed out by David Rowley.
2018-03-05Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)Alvaro Herrera
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates cloning of extended statistics on the source table, but it failed to do so. Patch it up so that it does. Also include an INCLUDING STATISTICS option to the LIKE clause, so that the behavior can be requested individually, or excluded individually. While at it, reorder the INCLUDING options, both in code and in docs, in alphabetical order which makes more sense than feature-implementation order that was previously used. Backpatch this to Postgres 10, where extended statistics were introduced, because this is seen as an oversight in a fresh feature which is better to get consistent from the get-go instead of changing only in pg11. In pg11, comments on statistics objects are cloned too. In pg10 they are not, because I (Álvaro) was too coward to change the parse node as required to support it. Also, in pg10 I chose not to renumber the parser symbols for the various INCLUDING options in LIKE, for the same reason. Any corresponding user-visible changes (docs) are backpatched, though. Reported-by: Stephen Froehlich Author: David Rowley Reviewed-by: Álvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-02Add prokind column, replacing proisagg and proiswindowPeter Eisentraut
The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-02-27Prevent dangling-pointer access when update trigger returns old tuple.Tom Lane
A before-update row trigger may choose to return the "new" or "old" tuple unmodified. ExecBRUpdateTriggers failed to consider the second possibility, and would proceed to free the "old" tuple even if it was the one returned, leading to subsequent access to already-deallocated memory. In debug builds this reliably leads to an "invalid memory alloc request size" failure; in production builds it might accidentally work, but data corruption is also possible. This is a very old bug. There are probably a couple of reasons it hasn't been noticed up to now. It would be more usual to return NULL if one wanted to suppress the update action; returning "old" is significantly less efficient since the update will occur anyway. Also, none of the standard PLs would ever cause this because they all returned freshly-manufactured tuples even if they were just copying "old". But commit 4b93f5799 changed that for plpgsql, making it possible to see the bug with a plpgsql trigger. Still, this is certainly legal behavior for a trigger function, so it's ExecBRUpdateTriggers's fault not plpgsql's. It seems worth creating a test case that exercises returning "old" directly with a C-language trigger; testing this through plpgsql seems unreliable because its behavior might change again. Report and fix by Rushabh Lathia; regression test case by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com
2018-02-22Support parameters in CALLPeter Eisentraut
To support parameters in CALL, move the parse analysis of the procedure and arguments into the global transformation phase, so that the parser hooks can be applied. And then at execution time pass the parameters from ProcessUtility on to ExecuteCallStmt.
2018-02-22Be lazier about partition tuple routing.Robert Haas
It's not necessary to fully initialize the executor data structures for partitions to which no tuples are ever routed. Consider, for example, an INSERT statement that inserts only one row: it only cares about the partition to which that one row is routed. The new function ExecInitPartitionInfo performs the initialization in question only when a particular partition is about to receive a tuple. This includes creating, validating, and saving a pointer to the ResultRelInfo, setting up for speculative insertions, translating WCOs and initializing the resulting expressions, translating returning lists and building the appropriate projection information, and setting up a tuple conversion map. One thing that's not deferred is locking the child partitions; that seems desirable but would need more thought. Still, testing shows that this makes single-row inserts significantly faster on a table with many partitions without harming the bulk-insert case. Amit Langote, reviewed by Etsuro Fujita, with a few changes by me Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
2018-02-20Error message improvementPeter Eisentraut
2018-02-19Allow UNIQUE indexes on partitioned tablesAlvaro Herrera
If we restrict unique constraints on partitioned tables so that they must always include the partition key, then our standard approach to unique indexes already works --- each unique key is forced to exist within a single partition, so enforcing the unique restriction in each index individually is enough to have it enforced globally. Therefore we can implement unique indexes on partitions by simply removing a few restrictions (and adding others.) Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime Casanova, Amit Langote
2018-02-18Message style fixPeter Eisentraut
2018-02-17Move function comment to the right placePeter Eisentraut
2018-02-16Allow tupleslots to have a fixed tupledesc, use in executor nodes.Andres Freund
The reason for doing so is that it will allow expression evaluation to optimize based on the underlying tupledesc. In particular it will allow to JIT tuple deforming together with the expression itself. For that expression initialization needs to be moved after the relevant slots are initialized - mostly unproblematic, except in the case of nodeWorktablescan.c. After doing so there's no need for ExecAssignResultType() and ExecAssignResultTypeFromTL() anymore, as all former callers have been converted to create a slot with a fixed descriptor. When creating a slot with a fixed descriptor, tts_values/isnull can be allocated together with the main slot, reducing allocation overhead and increasing cache density a bit. Author: Andres Freund Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-10Avoid premature free of pass-by-reference CALL arguments.Tom Lane
Prematurely freeing the EState used to evaluate CALL arguments led, in some cases, to passing dangling pointers to the procedure. This was masked in trivial cases because the argument pointers would point to Const nodes in the original expression tree, and in some other cases because the result value would end up in the standalone ExprContext rather than in memory belonging to the EState --- but that wasn't exactly high quality programming either, because the standalone ExprContext was never explicitly freed, breaking assorted API contracts. In addition, using a separate EState for each argument was just silly. So let's use just one EState, and one ExprContext, and make the latter belong to the former rather than be standalone, and clean up the EState (and hence the ExprContext) post-call. While at it, improve the function's commentary a bit. Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us
2018-02-10Fix oversight in CALL argument handling, and do some minor cleanup.Tom Lane
CALL statements cannot support sub-SELECTs in the arguments of the called procedure, since they just use ExecEvalExpr to evaluate such arguments. Teach transformSubLink() to reject the case, as it already does for other contexts in which subqueries are not supported. In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum symbol line up more closely with the phrasing of the error messages it is associated with. And fix someone's weak grasp of English grammar in the preceding EXPR_KIND_PARTITION_EXPRESSION addition. Also update an incorrect comment in resolve_unique_index_expr (possibly it was correct when written, but nowadays transformExpr definitely does reject SRFs here). Per report from Pavel Stehule --- but this resolves only one of the bugs he mentions. Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com
2018-02-08Avoid listing the same ResultRelInfo in more than one EState list.Robert Haas
Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple times. Commit 2f178441044be430f6b4d626e4dae68a9a6f6cec seems to be to blame for this. Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.
2018-02-07Support all SQL:2011 options for window frame clauses.Tom Lane
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
2018-02-02Fix application of identity values in some casesPeter Eisentraut
Investigation of 2d2d06b7e27e3177d5bef0061801c75946871db3 revealed that identity values were not applied in some further cases, including logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD COLUMN. To fix all that, apply the identity column expression in build_column_default() instead of repeating the same logic at each call site. For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding completely ignored that existing rows for the new column should have values filled in from the identity sequence. The coding using build_column_default() fails for this because the sequence ownership isn't registered until after ALTER TABLE, and we can't do it before because we don't have the column in the catalog yet. So we specially remember in ColumnDef the sequence name that we decided on and build a custom NextValueExpr using that. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-02Support parallel btree index builds.Robert Haas
To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-01-29Silence complaint about dead assignmentPeter Eisentraut
The preferred place for "placate compiler" assignments is after elog(ERROR), not before it. Otherwise, scan-build complains about a dead assignment.
2018-01-26Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.Tom Lane
We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26Remove the obsolete WITH clause of CREATE FUNCTION.Tom Lane
This clause was superseded by SQL-standard syntax back in 7.3. We've kept it around for backwards-compatibility purposes ever since; but 15 years seems like long enough for that, especially seeing that there are undocumented weirdnesses in how it interacts with the SQL-standard syntax for specifying the same options. Michael Paquier, per an observation by Daniel Gustafsson; some small cosmetic adjustments to nearby code by me. Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz
2018-01-25Ignore partitioned indexes where appropriateAlvaro Herrera
get_relation_info() was too optimistic about opening indexes in partitioned tables, which would raise errors when any queries were planned on such tables. Fix by ignoring any indexes of the partitioned kind. CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem. Fix by disallowing these commands in partitioned tables. Fallout from 8b08f7d4820f.
2018-01-22Transaction control in PL proceduresPeter Eisentraut
In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. - SPI Add a new function SPI_connect_ext() that is like SPI_connect() but allows passing option flags. The only option flag right now is SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction control commands, otherwise it's not allowed. This is meant to be passed down from CALL and DO statements which themselves know in which context they are called. A nonatomic SPI connection uses different memory management. A normal SPI connection allocates its memory in TopTransactionContext. For nonatomic connections we use PortalContext instead. As the comment in SPI_connect_ext() (previously SPI_connect()) indicates, one could potentially use PortalContext in all cases, but it seems safest to leave the existing uses alone, because this stuff is complicated enough already. SPI also gets new functions SPI_start_transaction(), SPI_commit(), and SPI_rollback(), which can be used by PLs to implement their transaction control logic. - portalmem.c Some adjustments were made in the code that cleans up portals at transaction abort. The portal code could already handle a command *committing* a transaction and continuing (e.g., VACUUM), but it was not quite prepared for a command *aborting* a transaction and continuing. In AtAbort_Portals(), remove the code that marks an active portal as failed. As the comment there already predicted, this doesn't work if the running command wants to keep running after transaction abort. And it's actually not necessary, because pquery.c is careful to run all portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if there is an exception. So the code in AtAbort_Portals() is never used anyway. In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not to clean up active portals too much. This mirrors similar code in PreCommit_Portals(). - PL/Perl Gets new functions spi_commit() and spi_rollback() - PL/pgSQL Gets new commands COMMIT and ROLLBACK. Update the PL/SQL porting example in the documentation to reflect that transactions are now possible in procedures. - PL/Python Gets new functions plpy.commit and plpy.rollback. - PL/Tcl Gets new commands commit and rollback. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>