summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2016-11-10Re-allow user_catalog_table option for materialized views.Tom Lane
The reloptions stuff allows this option to be set on a matview. While it's questionable whether that is useful or was really intended, it does work, and we shouldn't change that in minor releases. Commit e3e66d8a9 disabled the option since I didn't realize that it was possible for it to be set on a matview. Tweak the test to re-allow it. Discussion: <19749.1478711862@sss.pgh.pa.us>
2016-11-07Band-aid fix for incorrect use of view options as StdRdOptions.Tom Lane
We really ought to make StdRdOptions and the other decoded forms of reloptions self-identifying, but for the moment, assume that only plain relations could possibly be user_catalog_tables. Fixes problem with bogus "ON CONFLICT is not supported on table ... used as a catalog table" error when target is a view with cascade option. Discussion: <26681.1477940227@sss.pgh.pa.us>
2016-11-05Remove duplicate macro definition.Tom Lane
Seems to be a copy-and-pasteo. Odd that we heard no reports of compiler warnings about it. Thomas Munro
2016-10-24Stamp 9.5.5.REL9_5_5Tom Lane
2016-10-12Revert addition of PGDLLEXPORT in PG_FUNCTION_INFO_V1 macro.Tom Lane
This turns out not to be as harmless as I thought: MSVC will complain if it sees an "extern" declaration without PGDLLEXPORT and then one with. (Seems fairly silly, given that this can be changed after the fact by the linker, but there you have it.) Therefore, contrib modules that have extern's for V1 functions in header files are falling over in the buildfarm, since none of those externs are marked PGDLLEXPORT. We might or might not conclude that we're willing to plaster those declarations with PGDLLEXPORT in HEAD, but in any case there's no way we're going to ship this change in the back branches. Third-party authors would not thank us for breaking their code in a minor release. Hence, revert the addition of PGDLLEXPORT (but let's keep the extra info in the comment). If we do the other changes we can revert this commit in HEAD. Per buildfarm.
2016-10-12Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro.Tom Lane
This isn't really necessary for our own code, because we use a .DEF file in MSVC builds (see gendef.pl), or --export-all-symbols in MinGW and Cygwin builds, to ensure that all global symbols in loadable modules will be exported on Windows. However, third-party authors might use different build processes that need this marker, and it's harmless enough for our own builds. To some extent, this is an oversight in commit e7128e8db, so back-patch to 9.4 where that was added. Laurenz Albe Discussion: <A737B7A37273E048B164557ADEF4A58B539300BD@ntex2010a.host.magwien.gv.at>
2016-10-12Fix copy-pasto in comment.Heikki Linnakangas
Amit Langote
2016-10-07Fix fallback implementation of pg_atomic_write_u32().Andres Freund
I somehow had assumed that in the spinlock (in turn possibly using semaphores) based fallback atomics implementation 32 bit writes could be done without a lock. As far as the write goes that's correct, since postgres supports only platforms with single-copy atomicity for aligned 32bit writes. But writing without holding the spinlock breaks read-modify-write operations like pg_atomic_compare_exchange_u32(), since they'll potentially "miss" a concurrent write, which can't happen in actual hardware implementations. In 9.6+ when using the fallback atomics implementation this could lead to buffer header locks not being properly marked as released, and potentially some related state corruption. I don't see a related danger in 9.5 (earliest release with the API), because pg_atomic_write_u32() wasn't used in a concurrent manner there. The state variable of local buffers, before this change, were manipulated using pg_atomic_write_u32(), to avoid unnecessary synchronization overhead. As that'd not be the case anymore, introduce and use pg_atomic_unlocked_write_u32(), which does not correctly interact with RMW operations. This bug only caused issues when postgres is compiled on platforms without atomics support (i.e. no common new platform), or when compiled with --disable-atomics, which explains why this wasn't noticed in testing. Reported-By: Tom Lane Discussion: <14947.1475690465@sss.pgh.pa.us> Backpatch: 9.5-, where the atomic operations API was introduced.
2016-10-03Show a sensible value in pg_settings.unit for GUC_UNIT_XSEGS variables.Tom Lane
Commit 88e982302 invented GUC_UNIT_XSEGS for min_wal_size and max_wal_size, but neglected to make it display sensibly in pg_settings.unit (by adding a case to the switch in GetConfigOptionByNum). Fix that, and adjust said switch to throw a run-time error the next time somebody forgets. In passing, avoid using a static buffer for the output string --- the rest of this function pstrdup's from a local buffer, and I see no very good reason why the units code should do it differently and less safely. Per report from Otar Shavadze. Back-patch to 9.5 where the new unit type was added. Report: <CAG-jOyA=iNFhN+yB4vfvqh688B7Tr5SArbYcFUAjZi=0Exp-Lg@mail.gmail.com>
2016-09-23Avoid using PostmasterRandom() for DSM control segment ID.Tom Lane
Commits 470d886c3 et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>
2016-09-20Use PostmasterRandom(), not random(), for DSM control segment ID.Robert Haas
Otherwise, every startup gets the same "random" value, which is definitely not what was intended.
2016-09-20Fix outdated comments, GIST search queue is not an RBTree anymore.Heikki Linnakangas
The GiST search queue is implemented as a pairing heap rather than as Red-Black Tree, since 9.5 (commit e7032610). I neglected these comments in that commit.
2016-09-15Fix building with LibreSSL.Heikki Linnakangas
LibreSSL defines OPENSSL_VERSION_NUMBER to claim that it is version 2.0.0, but it doesn't have the functions added in OpenSSL 1.1.0. Add autoconf checks for the individual functions we need, and stop relying on OPENSSL_VERSION_NUMBER. Backport to 9.5 and 9.6, like the patch that broke this. In the back-branches, there are still a few OPENSSL_VERSION_NUMBER checks left, to check for OpenSSL 0.9.8 or 0.9.7. I left them as they were - LibreSSL has all those functions, so they work as intended. Per buildfarm member curculio. Discussion: <2442.1473957669@sss.pgh.pa.us>
2016-09-10Improve unreachability recognition in elog() macro.Tom Lane
Some experimentation with an older version of gcc showed that it is able to determine whether "if (elevel_ >= ERROR)" is compile-time constant if elevel_ is declared "const", but otherwise not so much. We had accounted for that in ereport() but were too miserly with braces to make it so in elog(). I don't know how many currently-interesting compilers have the same quirk, but in case it will save some code space, let's make sure that elog() is on the same footing as ereport() for this purpose. Back-patch to 9.3 where we introduced pg_unreachable() calls into elog/ereport.
2016-08-24Fix improper repetition of previous results from a hashed aggregate.Tom Lane
ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>
2016-08-18In plpgsql, don't try to convert int2vector or oidvector to expanded array.Tom Lane
These types are storage-compatible with real arrays, but they don't support toasting, so of course they can't support expansion either. Per bug #14289 from Michael Overmeyer. Back-patch to 9.5 where expanded arrays were introduced. Report: <20160818174414.1529.37913@wrigleys.postgresql.org>
2016-08-17Fix deletion of speculatively inserted TOAST on conflictAndres Freund
INSERT .. ON CONFLICT runs a pre-check of the possible conflicting constraints before performing the actual speculative insertion. In case the inserted tuple included TOASTed columns the ON CONFLICT condition would be handled correctly in case the conflict was caught by the pre-check, but if two transactions entered the speculative insertion phase at the same time, one would have to re-try, and the code for aborting a speculative insertion did not handle deleting the speculatively inserted TOAST datums correctly. TOAST deletion would fail with "ERROR: attempted to delete invisible tuple" as we attempted to remove the TOAST tuples using simple_heap_delete which reasoned that the given tuples should not be visible to the command that wrote them. This commit updates the heap_abort_speculative() function which aborts the conflicting tuple to use itself, via toast_delete, for deleting associated TOAST datums. Like before, the inserted toast rows are not marked as being speculative. This commit also adds a isolationtester spec test, exercising the relevant code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Reported-By: Viren Negi, Oskari Saarenmaa Author: Oskari Saarenmaa, edited a bit by me Bug: #14150 Discussion: <20160519123338.12513.20271@wrigleys.postgresql.org> Backpatch: 9.5, where ON CONFLICT was introduced
2016-08-14Remove bogus dependencies on NUMERIC_MAX_PRECISION.Tom Lane
NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision and scale you can write in a numeric typmod. It might once have had something to do with the allowed range of a typmod-less numeric value, but at least since 9.1 we've allowed, and documented that we allowed, any value that would physically fit in the numeric storage format; which is something over 100000 decimal digits, not 1000. Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit on the allowed range of the exponent in scientific-format input. That was especially silly in view of the fact that you can enter larger numbers as long as you don't use 'e' to do it. Just constrain the value enough to avoid localized overflow, and let make_result be the final arbiter of what is too large. Likewise adjust ecpg's equivalent of this code. Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the number of base-NBASE digits it would accept. That created a dump/restore hazard for binary COPY without doing anything useful; the wire-format limit on number of digits (65535) is about as tight as we would want. In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what the numeric range limit is. That code doesn't exist in the back branches. Per gripe from Aravind Kumar. Back-patch to all supported branches, since they all contain the documentation claim about allowed range of NUMERIC (cf commit cabf5d84b). Discussion: <2895.1471195721@sss.pgh.pa.us>
2016-08-08Stamp 9.5.4.REL9_5_4Tom Lane
2016-08-07Fix misestimation of n_distinct for a nearly-unique column with many nulls.Tom Lane
If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-08-02Sync 9.5 version of access/transam/parallel.c with HEAD.Tom Lane
This back-patches commit a5fe473ad (notably, marking ParallelMessagePending as volatile, which is not particularly optional). I also back-patched some previous cosmetic changes to remove unnecessary diffs between the two branches. I'm unsure how much of this code is actually reachable in 9.5, but to the extent that it is reachable, it needs to be maintained, and minimizing cross-branch diffs will make that easier.
2016-08-01Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch.Tom Lane
This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <1661.1469996911@sss.pgh.pa.us>
2016-07-29Fix pq_putmessage_noblock() to not block.Tom Lane
An evident copy-and-pasteo in commit 2bd9e412f broke the non-blocking aspect of pq_putmessage_noblock(), causing it to behave identically to pq_putmessage(). That function is nowadays used only in walsender.c, so that the net effect was to cause walsenders to hang up waiting for the receiver in situations where they should not. Kyotaro Horiguchi Patch: <20160728.185228.58375982.horiguchi.kyotaro@lab.ntt.co.jp>
2016-07-28Fix assorted fallout from IS [NOT] NULL patch.Tom Lane
Commits 4452000f3 et al established semantics for NullTest.argisrow that are a bit different from its initial conception: rather than being merely a cache of whether we've determined the input to have composite type, the flag now has the further meaning that we should apply field-by-field testing as per the standard's definition of IS [NOT] NULL. If argisrow is false and yet the input has composite type, the construct instead has the semantics of IS [NOT] DISTINCT FROM NULL. Update the comments in primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print such cases correctly. In the case of ruleutils.c, this merely results in cosmetic changes in EXPLAIN output, since the case can't currently arise in stored rules. However, it represents a live bug for deparse.c, which would formerly have sent a remote query that had semantics different from the local behavior. (From the user's standpoint, this means that testing a remote nested-composite column for null-ness could have had unexpected recursive behavior much like that fixed in 4452000f3.) In a related but somewhat independent fix, make plancat.c set argisrow to false in all NullTest expressions constructed to represent "attnotnull" constructs. Since attnotnull is actually enforced as a simple null-value check, this is a more accurate representation of the semantics; we were previously overpromising what it meant for composite columns, which might possibly lead to incorrect planner optimizations. (It seems that what the SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so arguably we are violating the spec and should fix attnotnull to do the other thing. If we ever do, this part should get reverted.) Back-patch, same as the previous commit. Discussion: <10682.1469566308@sss.pgh.pa.us>
2016-07-14Fix GiST index build for NaN values in geometric types.Tom Lane
GiST index build could go into an infinite loop when presented with boxes (or points, circles or polygons) containing NaN component values. This happened essentially because the code assumed that x == x is true for any "double" value x; but it's not true for NaNs. The looping behavior was not the only problem though: we also attempted to sort the items using simple double comparisons. Since NaNs violate the trichotomy law, qsort could (in principle at least) get arbitrarily confused and mess up the sorting of ordinary values as well as NaNs. And we based splitting choices on box size calculations that could produce NaNs, again resulting in undesirable behavior. To fix, replace all comparisons of doubles in this logic with float8_cmp_internal, which is NaN-aware and is careful to sort NaNs consistently, higher than any non-NaN. Also rearrange the box size calculation to not produce NaNs; instead it should produce an infinity for a box with NaN on one side and not-NaN on the other. I don't by any means claim that this solves all problems with NaNs in geometric values, but it should at least make GiST index insertion work reliably with such data. It's likely that the index search side of things still needs some work, and probably regular geometric operations too. But with this patch we're laying down a convention for how such cases ought to behave. Per bug #14238 from Guang-Dih Lei. Back-patch to 9.2; the code used before commit 7f3bd86843e5aad8 is quite different and doesn't lock up on my simple test case, nor on the submitter's dataset. Report: <20160708151747.1426.60150@wrigleys.postgresql.org> Discussion: <28685.1468246504@sss.pgh.pa.us>
2016-06-27Fix CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA to not plan the query.Tom Lane
Previously, these commands always planned the given query and went through executor startup before deciding not to actually run the query if WITH NO DATA is specified. This behavior is problematic for pg_dump because it may cause errors to be raised that we would rather not see before a REFRESH MATERIALIZED VIEW command is issued. See for example bug #13907 from Marian Krucina. This change is not sufficient to fix that particular bug, because we also need to tweak pg_dump to issue the REFRESH later, but it's a necessary step on the way. A user-visible side effect of doing things this way is that the returned command tag for WITH NO DATA cases will now be "CREATE MATERIALIZED VIEW" or "CREATE TABLE AS", not "SELECT 0". We could preserve the old behavior but it would take more code, and arguably that was just an implementation artifact not intended behavior anyhow. In 9.5 and HEAD, also get rid of the static variable CreateAsReladdr, which was trouble waiting to happen; there is not any prohibition on nested CREATE commands. Back-patch to 9.3 where CREATE MATERIALIZED VIEW was introduced. Michael Paquier and Tom Lane Report: <20160202161407.2778.24659@wrigleys.postgresql.org>
2016-06-24Fix handling of multixacts predating pg_upgradeAlvaro Herrera
After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit 8e9a16ab8f7f and was partially fixed in a25c2b7c4db3. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as a25c2b7c4db3 but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug: https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org
2016-06-24Fix building of large (bigger than shared_buffers) hash indexes.Tom Lane
When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <20160623162507.17237.39471@wrigleys.postgresql.org>
2016-06-09Clarify documentation of ceil/ceiling/floor functions.Tom Lane
Document these as "nearest integer >= argument" and "nearest integer <= argument", which will hopefully be less confusing than the old formulation. New wording is from Matlab via Dean Rasheed. I changed the pg_description entries as well as the SGML docs. In the back branches, this will only affect installations initdb'd in the future, but it should be harmless otherwise. Discussion: <CAEZATCW3yzJo-NMSiQs5jXNFbTsCEftZS-Og8=FvFdiU+kYuSA@mail.gmail.com>
2016-06-06Don't reset changes_since_analyze after a selective-columns ANALYZE.Tom Lane
If we ANALYZE only selected columns of a table, we should not postpone auto-analyze because of that; other columns may well still need stats updates. As committed, the counter is left alone if a column list is given, whether or not it includes all analyzable columns of the table. Per complaint from Tomasz Ostrowski. It's been like this a long time, so back-patch to all supported branches. Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>
2016-06-03Mark read/write expanded values as read-only in ExecProject().Tom Lane
If a plan node output expression returns an "expanded" datum, and that output column is referenced in more than one place in upper-level plan nodes, we need to ensure that what is returned is a read-only reference not a read/write reference. Otherwise one of the referencing sites could scribble on or even delete the expanded datum before we have evaluated the others. Commit 1dc5ebc9077ab742, which introduced this feature, supposed that it'd be sufficient to make SubqueryScan nodes force their output columns to read-only state. The folly of that was revealed by bug #14174 from Andrew Gierth, and really should have been immediately obvious considering that the planner will happily optimize SubqueryScan nodes out of the plan without any regard for this issue. The safest fix seems to be to make ExecProject() force its results into read-only state; that will cover every case where a plan node returns expression results. Actually we can delegate this to ExecTargetList() since we can recursively assume that plain Vars will not reference read-write datums. That should keep the extra overhead down to something minimal. We no longer need ExecMakeSlotContentsReadOnly(), which was introduced only in support of the idea that just a few plan node types would need to do this. In the future it would be nice to have the planner account for this problem and inject force-to-read-only expression evaluation nodes into only the places where there's a risk of multiple evaluation. That's not a suitable solution for 9.5 or even 9.6 at this point, though. Report: <20160603124628.9932.41279@wrigleys.postgresql.org>
2016-05-31Avoid useless closely-spaced writes of statistics files.Tom Lane
The original intent in the stats collector was that we should not write out stats data oftener than every PGSTAT_STAT_INTERVAL msec. Backends will not make requests at all if they see the existing data is newer than that, and the stats collector is supposed to disregard requests having a cutoff_time older than its most recently written data, so that close-together requests don't result in multiple writes. But the latter part of that got broken in commit 187492b6c2e8cafc, so that if two backends concurrently decide the existing stats are too old, the collector would write the data twice. (In principle the collector's logic would still merge requests as long as the second one arrives before we've actually written data ... but since the message collection loop would write data immediately after processing a single inquiry message, that never happened in practice, and in any case the window in which it might work would be much shorter than PGSTAT_STAT_INTERVAL.) To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff time is too old, and doesn't add a request to the queue if so. This means that we do not need DBWriteRequest.request_time, because the decision is taken before making a queue entry. And that means that we don't really need the DBWriteRequest data structure at all; an OID list of database OIDs will serve and allow removal of some rather verbose and crufty code. In passing, improve the comments in this area, which have been rather neglected. Also change backend_read_statsfile so that it's not silently relying on MyDatabaseId to have some particular value in the autovacuum launcher process. It accidentally worked as desired because MyDatabaseId is zero in that process; but that does not seem like a dependency we want, especially with no documentation about it. Although this patch is mine, it turns out I'd rediscovered a known bug, for which Tomas Vondra had already submitted a patch that's functionally equivalent to the non-cosmetic aspects of this patch. Thanks to Tomas for reviewing this version. Back-patch to 9.3 where the bug was introduced. Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz> Patch: <4625.1464202586@sss.pgh.pa.us>
2016-05-30Fix PageAddItem BRIN bugAlvaro Herrera
BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
2016-05-27Be more predictable about reporting "lock timeout" vs "statement timeout".Tom Lane
If both timeout indicators are set when we arrive at ProcessInterrupts, we've historically just reported "lock timeout". However, some buildfarm members have been observed to fail isolationtester's timeouts test by reporting "lock timeout" when the statement timeout was expected to fire first. The cause seems to be that the process is allowed to sleep longer than expected (probably due to heavy machine load) so that the lock timeout happens before we reach the point of reporting the error, and then this arbitrary tiebreak rule does the wrong thing. We can improve matters by comparing the scheduled timeout times to decide which error to report. I had originally proposed greatly reducing the 1-second window between the two timeouts in the test cases. On reflection that is a bad idea, at least for the case where the lock timeout is expected to fire first, because that would assume that it takes negligible time to get from statement start to the beginning of the lock wait. Thus, this patch doesn't completely remove the risk of test failures on slow machines. Empirically, however, the case this handles is the one we are seeing in the buildfarm. The explanation may be that the other case requires the scheduler to take the CPU away from a busy process, whereas the case fixed here only requires the scheduler to not give the CPU back right away to a process that has been woken from a multi-second sleep (and, perhaps, has been swapped out meanwhile). Back-patch to 9.3 where the isolationtester timeouts test was added. Discussion: <8693.1464314819@sss.pgh.pa.us>
2016-05-23Fix latent crash in do_text_output_multiline().Tom Lane
do_text_output_multiline() would fail (typically with a null pointer dereference crash) if its input string did not end with a newline. Such cases do not arise in our current sources; but it certainly could happen in future, or in extension code's usage of the function, so we should fix it. To fix, replace "eol += len" with "eol = text + len". While at it, make two cosmetic improvements: mark the input string const, and rename the argument from "text" to "txt" to dodge pgindent strangeness (since "text" is a typedef name). Even though this problem is only latent at present, it seems like a good idea to back-patch the fix, since it's a very simple/safe patch and it's not out of the realm of possibility that we might in future back-patch something that expects sane behavior from do_text_output_multiline(). Per report from Hao Lee. Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>
2016-05-09Stamp 9.5.3.REL9_5_3Tom Lane
2016-05-06Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.Tom Lane
This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much simpler solution for the case where the new cluster would choose to create a TOAST table but the old cluster doesn't have one: just don't create a TOAST table. The existing code failed in at least two different ways if the situation arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the lock sanity check in create_toast_table failed; (2) pg_upgrade did not provide a pg_type OID for the new toast table, so that the crosscheck in TypeCreate failed. While both these problems were introduced by later patches, they show that the hack being used to cause TOAST table creation is overwhelmingly fragile (and untested). I also note that before the TypeCreate crosscheck was added, the code would have resulted in assigning an indeterminate pg_type OID to the toast table, possibly causing a later OID conflict in that catalog; so that it didn't really work even when committed. If we simply don't create a TOAST table, there will only be a problem if the code tries to store a tuple that's wider than a page, and field compression isn't sufficient to get it under a page. Given that the TOAST creation threshold is intended to be about a quarter of a page, it's very hard to believe that cross-version differences in the do-we-need-a-toast- table heuristic could result in an observable problem. So let's just follow the old version's conclusion about whether a TOAST table is needed. (If we ever do change needs_toast_table() so much that this conclusion doesn't apply, we can devise a solution at that time, and hopefully do it in a less klugy way than 4c6780fd17aa43ed did.) Back-patch to 9.3, like the previous patch. Discussion: <8110.1462291671@sss.pgh.pa.us>
2016-04-29Fix mishandling of equivalence-class tests in parameterized plans.Tom Lane
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is *extremely* hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.
2016-04-29Fix comment whitespace in VS2105 patchAndrew Dunstan
per gripe from Michael Paquier.
2016-04-29Support building with Visual Studio 2015Andrew Dunstan
Adjust the way we detect the locale. As a result the minumum Windows version supported by VS2015 and later is Windows Vista. Add some tweaks to remove new compiler warnings. Remove documentation references to the now obsolete msysGit. Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich. Backpatch to 9.5
2016-04-28Adjust DatumGetBool macro, this time for sure.Tom Lane
Commit 23a41573c attempted to fix the DatumGetBool macro to ignore bits in a Datum that are to the left of the actual bool value. But it did that by casting the Datum to bool; and on compilers that use C99 semantics for bool, that ends up being a whole-word test, not a 1-byte test. This seems to be the true explanation for contrib/seg failing in VS2015. To fix, use GET_1_BYTE() explicitly. I think in the previous patch, I'd had some idea of not having to commit to bool being exactly 1 byte wide, but regardless of what the compiler's bool is, boolean columns and Datums are certainly 1 byte wide. The previous fix was (eventually) back-patched into all active versions, so do likewise with this one.
2016-04-26Impose a full barrier in generic-xlc.h atomics functions.Noah Misch
pg_atomic_compare_exchange_*_impl() were providing only the semantics of an acquire barrier. Buildfarm members hornet and mandrill revealed this deficit beginning with commit 008608b9d51061b1f598c197477b3dc7be9c4a64. While we have no report of symptoms in 9.5, we can't rule out the possibility of certain compilers, hardware, or extension code relying on these functions' specified barrier semantics. Back-patch to 9.5, where commit b64d92f1a5602c55ee8b27a7ac474f03b7aee340 introduced atomics. Reviewed by Andres Freund.
2016-04-21Improve TranslateSocketError() to handle more Windows error codes.Tom Lane
The coverage was rather lean for cases that bind() or listen() might return. Add entries for everything that there's a direct equivalent for in the set of Unix errnos that elog.c has heard of.
2016-04-21Remove dead code in win32.h.Tom Lane
There's no longer a need for the MSVC-version-specific code stanza that forcibly redefines errno code symbols, because since commit 73838b52 we're unconditionally redefining them in the stanza before this one anyway. Now it's merely confusing and ugly, so get rid of it; and improve the comment that explains what's going on here. Although this is just cosmetic, back-patch anyway since I'm intending to back-patch some less-cosmetic changes in this same hunk of code.
2016-04-21Provide errno-translation wrappers around bind() and listen() on Windows.Tom Lane
Fix Windows builds to report something useful rather than "could not bind IPv4 socket: No error" when bind() fails. Back-patch of commits d1b7d4877b9a71f4 and 22989a8e34168f57. Discussion: <4065.1452450340@sss.pgh.pa.us>
2016-04-20Fix memory leak and other bugs in ginPlaceToPage() & subroutines.Tom Lane
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <571276DD.5050303@dalibo.com>
2016-04-18Further reduce the number of semaphores used under --disable-spinlocks.Tom Lane
Per discussion, there doesn't seem to be much value in having NUM_SPINLOCK_SEMAPHORES set to 1024: under any scenario where you are running more than a few backends concurrently, you really had better have a real spinlock implementation if you want tolerable performance. And 1024 semaphores is a sizable fraction of the system-wide SysV semaphore limit on many platforms. Therefore, reduce this setting's default value to 128 to make it less likely to cause out-of-semaphores problems.
2016-04-14Remove trailing commas in enums.Andres Freund
These aren't valid C89. Found thanks to gcc's -Wc90-c99-compat. These exist in differing places in most supported branches.
2016-03-28Stamp 9.5.2.REL9_5_2Tom Lane
2016-03-27Change various Gin*Is* macros to return 0/1.Andres Freund
Returning the direct result of bit arithmetic, in a macro intended to be used in a boolean manner, can be problematic if the return value is stored in a variable of type 'bool'. If bool is implemented using C99's _Bool, that can lead to comparison failures if the variable is then compared again with the expression (see ginStepRight() for an example that fails), as _Bool forces the result to be 0/1. That happens in some configurations of newer MSVC compilers. It's also problematic when storing the result of such an expression in a narrower type. Several gin macros have been declared in that style since gin's initial commit in 8a3631f8d86. There's a lot more macros like this, but this is the only one causing regression test failures; and I don't want to commit and backpatch a larger patch with lots of conflicts just before the next set of minor releases. Discussion: 20150811154237.GD17575@awork2.anarazel.de Backpatch: All supported branches