summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
2014-12-11Fix planning of SELECT FOR UPDATE on child table with partial index.Tom Lane
Ordinarily we can omit checking of a WHERE condition that matches a partial index's condition, when we are using an indexscan on that partial index. However, in SELECT FOR UPDATE we must include the "redundant" filter condition in the plan so that it gets checked properly in an EvalPlanQual recheck. The planner got this mostly right, but improperly omitted the filter condition if the index in question was on an inheritance child table. In READ COMMITTED mode, this could result in incorrectly returning just-updated rows that no longer satisfy the filter condition. The cause of the error is using get_parse_rowmark() when get_plan_rowmark() is what should be used during planning. In 9.3 and up, also fix the same mistake in contrib/postgres_fdw. It's currently harmless there (for lack of inheritance support) but wrong is wrong, and the incorrect code might get copied to someplace where it's more significant. Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.
2014-12-11Fix corner case where SELECT FOR UPDATE could return a row twice.Tom Lane
In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
2014-12-11Fix assorted confusion between Oid and int32.Tom Lane
In passing, also make some debugging elog's in pgstat.c a bit more consistently worded. Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are really old). Mark Dilger identified and patched the type violations; the message rewordings are mine.
2014-12-10Fix minor thinko in convertToJsonb().Tom Lane
The amount of space to reserve for the value's varlena header is VARHDRSZ, not sizeof(VARHDRSZ). The latter coding accidentally failed to fail because of the way the VARHDRSZ macro is currently defined; but if we ever change it to return size_t (as one might reasonably expect it to do), convertToJsonb() would have failed. Spotted by Mark Dilger.
2014-12-05Print wal_log_hints in the rm_desc routing of a parameter-change record.Heikki Linnakangas
It was an oversight in the original commit. Also note in the sample config file that changing wal_log_hints requires a restart. Michael Paquier. Backpatch to 9.4, where wal_log_hints was added.
2014-12-03Fix typosAlvaro Herrera
2014-12-02Improve error messages for malformed array input strings.Tom Lane
Make the error messages issued by array_in() uniformly follow the style ERROR: malformed array literal: "actual input string" DETAIL: specific complaint here and rewrite many of the specific complaints to be clearer. The immediate motivation for doing this is a complaint from Josh Berkus that json_to_record() produced an unintelligible error message when dealing with an array item, because it tries to feed the JSON-format array value to array_in(). Really it ought to be smart enough to perform JSON-to-Postgres array conversion, but that's a future feature not a bug fix. In the meantime, this change is something we agreed we could back-patch into 9.4, and it should help de-confuse things a bit.
2014-12-02Don't skip SQL backends in logical decoding for visibility computation.Andres Freund
The logical decoding patchset introduced PROC_IN_LOGICAL_DECODING flag PGXACT flag, that allows such backends to be skipped when computing the xmin horizon/snapshots. That's fine and sensible for walsenders streaming out logical changes, but not at all fine for SQL backends doing logical decoding. If the latter set that flag any change they have performed outside of logical decoding will not be regarded as visible - which e.g. can lead to that change being vacuumed away. Note that not setting the flag for SQL backends isn't particularly bothersome - the SQL backend doesn't do streaming, so it only runs for a limited amount of time. Per buildfarm member 'tick' and Alvaro. Backpatch to 9.4, where logical decoding was introduced.
2014-12-02Fix JSON aggregates to work properly when final function is re-executed.Tom Lane
Davide S. reported that json_agg() sometimes produced multiple trailing right brackets. This turns out to be because json_agg_finalfn() attaches the final right bracket, and was doing so by modifying the aggregate state in-place. That's verboten, though unfortunately it seems there's no way for nodeAgg.c to check for such mistakes. Fix that back to 9.3 where the broken code was introduced. In 9.4 and HEAD, likewise fix json_object_agg(), which had copied the erroneous logic. Make some cosmetic cleanups as well.
2014-12-01Guard against bad "dscale" values in numeric_recv().Tom Lane
We were not checking to see if the supplied dscale was valid for the given digit array when receiving binary-format numeric values. While dscale can validly be more than the number of nonzero fractional digits, it shouldn't be less; that case causes fractional digits to be hidden on display even though they're there and participate in arithmetic. Bug #12053 from Tommaso Sala indicates that there's at least one broken client library out there that sometimes supplies an incorrect dscale value, leading to strange behavior. This suggests that simply throwing an error might not be the best response; it would lead to failures in applications that might seem to be working fine today. What seems the least risky fix is to truncate away any digits that would be hidden by dscale. This preserves the existing behavior in terms of what will be printed for the transmitted value, while preventing subsequent arithmetic from producing results inconsistent with that. In passing, throw a specific error for the case of dscale being outside the range that will fit into a numeric's header. Before you got "value overflows numeric format", which is a bit misleading. Back-patch to all supported branches.
2014-12-01Fix hstore_to_json_loose's detection of valid JSON number values.Andrew Dunstan
We expose a function IsValidJsonNumber that internally calls the lexer for json numbers. That allows us to use the same test everywhere, instead of inventing a broken test for hstore conversions. The new function is also used in datum_to_json, replacing the code that is now moved to the new function. Backpatch to 9.3 where hstore_to_json_loose was introduced.
2014-11-28Update transaction README for persistent multixactsAlvaro Herrera
Multixacts are now maintained during recovery, but the README didn't get the memo. Backpatch to 9.3, where the divergence was introduced.
2014-11-22Fix mishandling of system columns in FDW queries.Tom Lane
postgres_fdw would send query conditions involving system columns to the remote server, even though it makes no effort to ensure that system columns other than CTID match what the remote side thinks. tableoid, in particular, probably won't match and might have some use in queries. Hence, prevent sending conditions that include non-CTID system columns. Also, create_foreignscan_plan neglected to check local restriction conditions while determining whether to set fsSystemCol for a foreign scan plan node. This again would bollix the results for queries that test a foreign table's tableoid. Back-patch the first fix to 9.3 where postgres_fdw was introduced. Back-patch the second to 9.2. The code is probably broken in 9.1 as well, but the patch doesn't apply cleanly there; given the weak state of support for FDWs in 9.1, it doesn't seem worth fixing. Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
2014-11-17Fix WAL-logging of B-tree "unlink halfdead page" operation.Heikki Linnakangas
There was some confusion on how to record the case that the operation unlinks the last non-leaf page in the branch being deleted. _bt_unlink_halfdead_page set the "topdead" field in the WAL record to the leaf page, but the redo routine assumed that it would be an invalid block number in that case. This commit fixes _bt_unlink_halfdead_page to do what the redo routine expected. This code is new in 9.4, so backpatch there.
2014-11-16Translation updatesPeter Eisentraut
2014-11-15Sync unlogged relations to disk after they have been reset.Andres Freund
Unlogged relations are only reset when performing a unclean restart. That means they have to be synced to disk during clean shutdowns. During normal processing that's achieved by registering a buffer's file to be fsynced at the next checkpoint when flushed. But ResetUnloggedRelations() doesn't go through the buffer manager, so nothing will force reset relations to disk before the next shutdown checkpoint. So just make ResetUnloggedRelations() fsync the newly created main forks to disk. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
2014-11-15Ensure unlogged tables are reset even if crash recovery errors out.Andres Freund
Unlogged relations are reset at the end of crash recovery as they're only synced to disk during a proper shutdown. Unfortunately that and later steps can fail, e.g. due to running out of space. This reset was, up to now performed after marking the database as having finished crash recovery successfully. As out of space errors trigger a crash restart that could lead to the situation that not all unlogged relations are reset. Once that happend usage of unlogged relations could yield errors like "could not open file "...": No such file or directory". Luckily clusters that show the problem can be fixed by performing a immediate shutdown, and starting the database again. To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier, before marking the database as having successfully recovered. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
2014-11-14Allow interrupting GetMultiXactIdMembersAlvaro Herrera
This function has a loop which can lead to uninterruptible process "stalls" (actually infinite loops) when some bugs are triggered. Avoid that unpleasant situation by adding a check for interrupts in a place that shouldn't degrade performance in the normal case. Backpatch to 9.3. Older branches have an identical loop here, but the aforementioned bugs are only a problem starting in 9.3 so there doesn't seem to be any point in backpatching any further.
2014-11-13Improve logical decoding log messagesPeter Eisentraut
suggestions from Robert Haas
2014-11-13Fix and improve cache invalidation logic for logical decoding.Andres Freund
There are basically three situations in which logical decoding needs to perform cache invalidation. During/After replaying a transaction with catalog changes, when skipping a uninteresting transaction that performed catalog changes and when erroring out while replaying a transaction. Unfortunately these three cases were all done slightly differently - partially because 8de3e410fa, which greatly simplifies matters, got committed in the midst of the development of logical decoding. The actually problematic case was when logical decoding skipped transaction commits (and thus processed invalidations). When used via the SQL interface cache invalidation could access the catalog - bad, because we didn't set up enough state to allow that correctly. It'd not be hard to setup sufficient state, but the simpler solution is to always perform cache invalidation outside a valid transaction. Also make the different cache invalidation cases look as similar as possible, to ease code review. This fixes the assertion failure reported by Antonin Houska in 53EE02D9.7040702@gmail.com. The presented testcase has been expanded into a regression test. Backpatch to 9.4, where logical decoding was introduced.
2014-11-13Fix xmin/xmax horizon computation during logical decoding initialization.Andres Freund
When building the initial historic catalog snapshot there were scenarios where snapbuild.c would use incorrect xmin/xmax values when starting from a xl_running_xacts record. The values used were always a bit suspect, but happened to be correct in the easy to test cases. Notably the values used when the the initial snapshot was computed while no other transactions were running were correct. This is likely to be the cause of the occasional buildfarm failures on animals markhor and tick; but it's quite possible to reproduce problems without CLOBBER_CACHE_ALWAYS. Backpatch to 9.4, where logical decoding was introduced.
2014-11-13Fix race condition between hot standby and restoring a full-page image.Heikki Linnakangas
There was a window in RestoreBackupBlock where a page would be zeroed out, but not yet locked. If a backend pinned and locked the page in that window, it saw the zeroed page instead of the old page or new page contents, which could lead to missing rows in a result set, or errors. To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins, zeroes, and locks the page, if it's not in the buffer cache already. In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE, to avoid breaking any 3rd party extensions that might use RBM_ZERO. More importantly, this avoids renumbering the other enum values, which would cause even bigger confusion in extensions that use ReadBufferExtended, but haven't been recompiled. Backpatch to all supported versions; this has been racy since hot standby was introduced.
2014-11-12Explicitly support the case that a plancache's raw_parse_tree is NULL.Tom Lane
This only happens if a client issues a Parse message with an empty query string, which is a bit odd; but since it is explicitly called out as legal by our FE/BE protocol spec, we'd probably better continue to allow it. Fix by adding tests everywhere that the raw_parse_tree field is passed to functions that don't or shouldn't accept NULL. Also make it clear in the relevant comments that NULL is an expected case. This reverts commits a73c9dbab0165b3395dfe8a44a7dfd16166963c4 and 2e9650cbcff8c8fb0d9ef807c73a44f241822eee, which fixed specific crash symptoms by hacking things at what now seems to be the wrong end, ie the callee functions. Making the callees allow NULL is superficially more robust, but it's not always true that there is a defensible thing for the callee to do in such cases. The caller has more context and is better able to decide what the empty-query case ought to do. Per followup discussion of bug #11335. Back-patch to 9.2. The code before that is sufficiently different that it would require development of a separate patch, which doesn't seem worthwhile for what is believed to be an essentially cosmetic change.
2014-11-12Fix several weaknesses in slot and logical replication on-disk serialization.Andres Freund
Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and snapbuild.c were missing the FIN_CRC32 call when computing/checking checksums of on disk files. That doesn't lower the the error detection capabilities of the checksum, but is inconsistent with other usages. In a followup mail Heikki also noticed that, contrary to a comment, the 'version' and 'length' struct fields of replication slot's on disk data where not covered by the checksum. That's not likely to lead to actually missed corruption as those fields are cross checked with the expected version and the actual file length. But it's wrong nonetheless. As fixing these issues makes existing on disk files unreadable, bump the expected versions of on disk files for both slots and logical decoding historic catalog snapshots. This means that loading old files will fail with ERROR: "replication slot file ... has unsupported version 1" and ERROR: "snapbuild state file ... has unsupported version 1 instead of 2" respectively. Given the low likelihood of anybody already using these new features in a production setup that seems acceptable. Fixing these issues made me notice that there's no regression test covering the loading of historic snapshot from disk - so add one. Backpatch to 9.4 where these features were introduced.
2014-11-12Use just one database connection in the "tablespace" test.Noah Misch
On Windows, DROP TABLESPACE has a race condition when run concurrently with other processes having opened files in the tablespace. This led to a rare failure on buildfarm member frogmouth. Back-patch to 9.4, where the reconnection was introduced.
2014-11-11Message improvementsPeter Eisentraut
2014-11-11Fix dependency searching for case where column is visited before table.Tom Lane
When the recursive search in dependency.c visits a column and then later visits the whole table containing the column, it needs to propagate the drop-context flags for the table to the existing target-object entry for the column. Otherwise we might refuse the DROP (if not CASCADE) on the incorrect grounds that there was no automatic drop pathway to the column. Remarkably, this has not been reported before, though it's possible at least when an extension creates both a datatype and a table using that datatype. Rather than just marking the column as allowed to be dropped, it might seem good to skip the DROP COLUMN step altogether, since the later DROP of the table will surely get the job done. The problem with that is that the datatype would then be dropped before the table (since the whole situation occurred because we visited the datatype, and then recursed to the dependent column, before visiting the table). That seems pretty risky, and the case is rare enough that it doesn't seem worth expending a lot of effort or risk to make the drops happen in a safe order. So we just play dumb and delete the column separately according to the existing drop ordering rules. Per report from Petr Jelinek, though this is different from his proposed patch. Back-patch to 9.1, where extensions were introduced. There's currently no evidence that such cases can arise before 9.1, and in any case we would also need to back-patch cb5c2ba2d82688d29b5902d86b993a54355cad4d to 9.0 if we wanted to back-patch this.
2014-11-10Ensure that RowExprs and whole-row Vars produce the expected column names.Tom Lane
At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
2014-11-07Fix generation of SP-GiST vacuum WAL records.Heikki Linnakangas
I broke these in 8776faa81cb651322b8993422bdd4633f1f6a487. Backpatch to 9.4, where that was done.
2014-11-06Cope with more than 64K phrases in a thesaurus dictionary.Tom Lane
dict_thesaurus stored phrase IDs in uint16 fields, so it would get confused and even crash if there were more than 64K entries in the configuration file. It turns out to be basically free to widen the phrase IDs to uint32, so let's just do so. This was complained of some time ago by David Boutin (in bug #7793); he later submitted an informal patch but it was never acted on. We now have another complaint (bug #11901 from Luc Ouellette) so it's time to make something happen. This is basically Boutin's patch, but for future-proofing I also added a defense against too many words per phrase. Note that we don't need any explicit defense against overflow of the uint32 counters, since before that happens we'd hit array allocation sizes that repalloc rejects. Back-patch to all supported branches because of the crash risk.
2014-11-06Fix normalization of numeric values in JSONB GIN indexes.Tom Lane
The default JSONB GIN opclass (jsonb_ops) converts numeric data values to strings for storage in the index. It must ensure that numeric values that would compare equal (such as 12 and 12.00) produce identical strings, else index searches would have behavior different from regular JSONB comparisons. Unfortunately the function charged with doing this was completely wrong: it could reduce distinct numeric values to the same string, or reduce equivalent numeric values to different strings. The former type of error would only lead to search inefficiency, but the latter type of error would cause index entries that should be found by a search to not be found. Repairing this bug therefore means that it will be necessary for 9.4 beta testers to reindex GIN jsonb_ops indexes, if they care about getting correct results from index searches involving numeric data values within the comparison JSONB object. Per report from Thomas Fanghaenel.
2014-11-06Prevent the unnecessary creation of .ready file for the timeline history file.Fujii Masao
Previously .ready file was created for the timeline history file at the end of an archive recovery even when WAL archiving was not enabled. This creation is unnecessary and causes .ready file to remain infinitely. This commit changes an archive recovery so that it creates .ready file for the timeline history file only when WAL archiving is enabled. Backpatch to all supported versions.
2014-11-04Drop no-longer-needed buffers during ALTER DATABASE SET TABLESPACE.Tom Lane
The previous coding assumed that we could just let buffers for the database's old tablespace age out of the buffer arena naturally. The folly of that is exposed by bug #11867 from Marc Munro: the user could later move the database back to its original tablespace, after which any still-surviving buffers would match lookups again and appear to contain valid data. But they'd be missing any changes applied while the database was in the new tablespace. This has been broken since ALTER SET TABLESPACE was introduced, so back-patch to all supported branches.
2014-10-30Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats.Tom Lane
As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover all cases where ANALYZE might be invoked in an unsafe context. We need to test the result of IsInTransactionChain not IsTransactionBlock; which is notationally a pain because IsInTransactionChain requires an isTopLevel flag, which would have to be passed down through several levels of callers. I chose to pass in_outer_xact (ie, the result of IsInTransactionChain) rather than isTopLevel per se, as that seemed marginally more apropos for the intermediate functions to know about.
2014-10-30"Pin", rather than "keep", dynamic shared memory mappings and segments.Robert Haas
Nobody seemed concerned about this naming when it originally went in, but there's a pending patch that implements the opposite of dsm_keep_mapping, and the term "unkeep" was judged unpalatable. "unpin" has existing precedent in the PostgreSQL code base, and the English language, so use this terminology instead. Per discussion, back-patch to 9.4.
2014-10-29Avoid corrupting tables when ANALYZE inside a transaction is rolled back.Tom Lane
VACUUM and ANALYZE update the target table's pg_class row in-place, that is nontransactionally. This is OK, more or less, for the statistical columns, which are mostly nontransactional anyhow. It's not so OK for the DDL hint flags (relhasindex etc), which might get changed in response to transactional changes that could still be rolled back. This isn't a problem for VACUUM, since it can't be run inside a transaction block nor in parallel with DDL on the table. However, we allow ANALYZE inside a transaction block, so if the transaction had earlier removed the last index, rule, or trigger from the table, and then we roll back the transaction after ANALYZE, the table would be left in a corrupted state with the hint flags not set though they should be. To fix, suppress the hint-flag updates if we are InTransactionBlock(). This is safe enough because it's always OK to postpone hint maintenance some more; the worst-case consequence is a few extra searches of pg_index et al. There was discussion of instead using a transactional update, but that would change the behavior in ways that are not all desirable: in most scenarios we're better off keeping ANALYZE's statistical values even if the ANALYZE itself rolls back. In any case we probably don't want to change this behavior in back branches. Per bug #11638 from Casey Shobe. This has been broken for a good long time, so back-patch to all supported branches. Tom Lane and Michael Paquier, initial diagnosis by Andres Freund
2014-10-28Remove obsolete commentary.Tom Lane
Since we got rid of non-MVCC catalog scans, the fourth reason given for using a non-transactional update in index_update_stats() is obsolete. The other three are still good, so we're not going to change the code, but fix the comment.
2014-10-28Remove unnecessary assignment.Heikki Linnakangas
Reported by MauMau.
2014-10-27Fix two bugs in tsquery @> operator.Heikki Linnakangas
1. The comparison for matching terms used only the CRC to decide if there's a match. Two different terms with the same CRC gave a match. 2. It assumed that if the second operand has more terms than the first, it's never a match. That assumption is bogus, because there can be duplicate terms in either operand. Rewrite the implementation in a way that doesn't have those bugs. Backpatch to all supported versions.
2014-10-26Improve planning of btree index scans using ScalarArrayOpExpr quals.Tom Lane
Since we taught btree to handle ScalarArrayOpExpr quals natively (commit 9e8da0f75731aaa7605cf4656c21ea09e84d2eb1), the planner has always included ScalarArrayOpExpr quals in index conditions if possible. However, if the qual is for a non-first index column, this could result in an inferior plan because we can no longer take advantage of index ordering (cf. commit 807a40c551dd30c8dd5a0b3bd82f5bbb1e7fd285). It can be better to omit the ScalarArrayOpExpr qual from the index condition and let it be done as a filter, so that the output doesn't need to get sorted. Indeed, this is true for the query introduced as a test case by the latter commit. To fix, restructure get_index_paths and build_index_paths so that we consider paths both with and without ScalarArrayOpExpr quals in non-first index columns. Redesign the API of build_index_paths so that it reports what it found, saving useless second or third calls. Report and patch by Andrew Gierth (though rather heavily modified by me). Back-patch to 9.2 where this code was introduced, since the issue can result in significant performance regressions compared to plans produced by 9.1 and earlier.
2014-10-23Improve ispell dictionary's defenses against bad affix files.Tom Lane
Don't crash if an ispell dictionary definition contains flags but not any compound affixes. (This isn't a security issue since only superusers can install affix files, but still it's a bad thing.) Also, be more careful about detecting whether an affix-file FLAG command is old-format (ispell) or new-format (myspell/hunspell). And change the error message about mixed old-format and new-format commands into something intelligible. Per bug #11770 from Emre Hasegeli. Back-patch to all supported branches.
2014-10-23Prevent the already-archived WAL file from being archived again.Fujii Masao
Previously the archive recovery always created .ready file for the last WAL file of the old timeline at the end of recovery even when it's restored from the archive and has .done file. That is, there was the case where the WAL file had both .ready and .done files. This caused the already-archived WAL file to be archived again. This commit prevents the archive recovery from creating .ready file for the last WAL file if it has .done file, in order to prevent it from being archived again. This bug was added when cascading replication feature was introduced, i.e., the commit 5286105800c7d5902f98f32e11b209c471c0c69c. So, back-patch to 9.2, where cascading replication was added. Reviewed by Michael Paquier
2014-10-20Flush unlogged table's buffers when copying or moving databases.Andres Freund
CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE copy the source database directory on the filesystem level. To ensure the on disk state is consistent they block out users of the affected database and force a checkpoint to flush out all data to disk. Unfortunately, up to now, that checkpoint didn't flush out dirty buffers from unlogged relations. That bug means there could be leftover dirty buffers in either the template database, or the database in its old location. Leading to problems when accessing relations in an inconsistent state; and to possible problems during shutdown in the SET TABLESPACE case because buffers belonging files that don't exist anymore are flushed. This was reported in bug #10675 by Maxim Boguk. Fix by Pavan Deolasee, modified somewhat by me. Reviewed by MauMau and Fujii Masao. Backpatch to 9.1 where unlogged tables were introduced.
2014-10-20Fix mishandling of FieldSelect-on-whole-row-Var in nested lateral queries.Tom Lane
If an inline-able SQL function taking a composite argument is used in a LATERAL subselect, and the composite argument is a lateral reference, the planner could fail with "variable not found in subplan target list", as seen in bug #11703 from Karl Bartel. (The outer function call used in the bug report and in the committed regression test is not really necessary to provoke the bug --- you can get it if you manually expand the outer function into "LATERAL (SELECT inner_function(outer_relation))", too.) The cause of this is that we generate the reltargetlist for the referenced relation before doing eval_const_expressions() on the lateral sub-select's expressions (cf find_lateral_references()), so what's scheduled to be emitted by the referenced relation is a whole-row Var, not the simplified single-column Var produced by optimizing the function's FieldSelect on the whole-row Var. Then setrefs.c fails to match up that lateral reference to what's available from the outer scan. Preserving the FieldSelect optimization in such cases would require either major planner restructuring (to recursively do expression simplification on sub-selects much earlier) or some amazingly ugly kluge to change the reltargetlist of a possibly-already-planned relation. It seems better just to skip the optimization when the Var is from an upper query level; the case is not so common that it's likely anyone will notice a few wasted cycles. AFAICT this problem only occurs for uplevel LATERAL references, so back-patch to 9.3 where LATERAL was added.
2014-10-17Avoid core dump in _outPathInfo() for Path without a parent RelOptInfo.Tom Lane
Nearly all Paths have parents, but a ResultPath representing an empty FROM clause does not. Avoid a core dump in such cases. I believe this is only a hazard for debugging usage, not for production, else we'd have heard about it before. Nonetheless, back-patch to 9.1 where the troublesome code was introduced. Noted while poking at bug #11703.
2014-10-16Support timezone abbreviations that sometimes change.Tom Lane
Up to now, PG has assumed that any given timezone abbreviation (such as "EDT") represents a constant GMT offset in the usage of any particular region; we had a way to configure what that offset was, but not for it to be changeable over time. But, as with most things horological, this view of the world is too simplistic: there are numerous regions that have at one time or another switched to a different GMT offset but kept using the same timezone abbreviation. Almost the entire Russian Federation did that a few years ago, and later this month they're going to do it again. And there are similar examples all over the world. To cope with this, invent the notion of a "dynamic timezone abbreviation", which is one that is referenced to a particular underlying timezone (as defined in the IANA timezone database) and means whatever it currently means in that zone. For zones that use or have used daylight-savings time, the standard and DST abbreviations continue to have the property that you can specify standard or DST time and get that time offset whether or not DST was theoretically in effect at the time. However, the abbreviations mean what they meant at the time in question (or most recently before that time) rather than being absolutely fixed. The standard abbreviation-list files have been changed to use this behavior for abbreviations that have actually varied in meaning since 1970. The old simple-numeric definitions are kept for abbreviations that have not changed, since they are a bit faster to resolve. While this is clearly a new feature, it seems necessary to back-patch it into all active branches, because otherwise use of Russian zone abbreviations is going to become even more problematic than it already was. This change supersedes the changes in commit 513d06ded et al to modify the fixed meanings of the Russian abbreviations; since we've not shipped that yet, this will avoid an undesirably incompatible (not to mention incorrect) change in behavior for timestamps between 2011 and 2014. This patch makes some cosmetic changes in ecpglib to keep its usage of datetime lookup tables as similar as possible to the backend code, but doesn't do anything about the increasingly obsolete set of timezone abbreviation definitions that are hard-wired into ecpglib. Whatever we do about that will likely not be appropriate material for back-patching. Also, a potential free() of a garbage pointer after an out-of-memory failure in ecpglib has been fixed. This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that caused it to produce unexpected results near a timezone transition, if both the "before" and "after" states are marked as standard time. We'd only ever thought about or tested transitions between standard and DST time, but that's not what's happening when a zone simply redefines their base GMT offset. In passing, update the SGML documentation to refer to the Olson/zoneinfo/ zic timezone database as the "IANA" database, since it's now being maintained under the auspices of IANA.
2014-10-15Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN.Tom Lane
We've gotten enough push-back on that change to make it clear that it wasn't an especially good idea to do it like that. Revert plain EXPLAIN to its previous behavior, but keep the extra output in EXPLAIN ANALYZE. Per discussion. Internally, I set this up as a separate flag ExplainState.summary that controls printing of planning time and execution time. For now it's just copied from the ANALYZE option, but we could consider exposing it to users.
2014-10-14Don't let protected variable access to be reordered after spinlock release.Heikki Linnakangas
LWLockAcquireWithVar needs to set the protected variable while holding the spinlock. Need to use a volatile pointer to make sure it doesn't get reordered by the compiler. The other functions that accessed the protected variable already got this right. 9.4 only. Earlier releases didn't have this code, and in master, spinlock release acts as a compiler barrier.
2014-10-14Fix deadlock with LWLockAcquireWithVar and LWLockWaitForVar.Heikki Linnakangas
LWLockRelease should release all backends waiting with LWLockWaitForVar, even when another backend has already been woken up to acquire the lock, i.e. when releaseOK is false. LWLockWaitForVar can return as soon as the protected value changes, even if the other backend will acquire the lock. Fix that by resetting releaseOK to true in LWLockWaitForVar, whenever adding itself to the wait queue. This should fix the bug reported by MauMau, where the system occasionally hangs when there is a lot of concurrent WAL activity and a checkpoint. Backpatch to 9.4, where this code was added.
2014-10-12Message improvementsPeter Eisentraut