summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
2013-06-23Ensure no xid gaps during Hot Standby startupSimon Riggs
In some cases with higher numbers of subtransactions it was possible for us to incorrectly initialize subtrans leading to complaints of missing pages. Bug report by Sergey Konoplev Analysis and fix by Andres Freund
2013-06-13Only install a portal's ResourceOwner if it actually has one.Tom Lane
In most scenarios a portal without a ResourceOwner is dead and not subject to any further execution, but a portal for a cursor WITH HOLD remains in existence with no ResourceOwner after the creating transaction is over. In this situation, if we attempt to "execute" the portal directly to fetch data from it, we were setting CurrentResourceOwner to NULL, leading to a segfault if the datatype output code did anything that required a resource owner (such as trying to fetch system catalog entries that weren't already cached). The case appears to be impossible to provoke with stock libpq, but psqlODBC at least is able to cause it when working with held cursors. Simplest fix is to just skip the assignment to CurrentResourceOwner, so that any resources used by the data output operations will be managed by the transaction-level resource owner instead. For consistency I changed all the places that install a portal's resowner as current, even though some of them are probably not reachable with a held cursor's portal. Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing a self-contained test case). Back-patch to all supported versions.
2013-06-09Remove unnecessary restrictions about RowExprs in transformAExprIn().Tom Lane
When the existing code here was written, it made sense to special-case RowExprs because that was the only way that we could handle row comparisons at all. Now that we have record_eq() and arrays of composites, the generic logic for "scalar" types will in fact work on RowExprs too, so there's no reason to throw error for combinations of RowExprs and other ways of forming composite values, nor to ignore the possibility of using a ScalarArrayOpExpr. But keep using the old logic when comparing two RowExprs, for consistency with the main transformAExprOp() logic. (This allows some cases with not-quite-identical rowtypes to succeed, so we might get push-back if we removed it.) Per bug #8198 from Rafal Rzepecki. Back-patch to all supported branches, since this works fine as far back as 8.4. Rafal Rzepecki and Tom Lane
2013-06-09Remove ALTER DEFAULT PRIVILEGES' requirement of schema CREATE permissions.Tom Lane
Per discussion, this restriction isn't needed for any real security reason, and it seems to confuse people more often than it helps them. It could also result in some database states being unrestorable. So just drop it. Back-patch to 9.0, where ALTER DEFAULT PRIVILEGES was introduced.
2013-06-08Don't downcase non-ascii identifier chars in multi-byte encodings.Andrew Dunstan
Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases.
2013-06-06Fix typo in comment.Heikki Linnakangas
2013-06-05Prevent pushing down WHERE clauses into unsafe UNION/INTERSECT nests.Tom Lane
The planner is aware that it mustn't push down upper-level quals into subqueries if the quals reference subquery output columns that contain set-returning functions or volatile functions, or are non-DISTINCT outputs of a DISTINCT ON subquery. However, it missed making this check when there were one or more levels of UNION or INTERSECT above the dangerous expression. This could lead to "set-valued function called in context that cannot accept a set" errors, as seen in bug #8213 from Eric Soroos, or to silently wrong answers in the other cases. To fix, refactor the checks so that we make the column-is-unsafe checks during subquery_is_pushdown_safe(), which already has to recursively inspect all arms of a set-operation tree. This makes qual_is_pushdown_safe() considerably simpler, at the cost that we will spend some cycles checking output columns that possibly aren't referenced in any upper qual. But the cases where this code gets executed at all are already nontrivial queries, so it's unlikely anybody will notice any slowdown of planning. This has been broken since commit 05f916e6add9726bf4ee046e4060c1b03c9961f2, which makes the bug over ten years old. A bit surprising nobody noticed it before now.
2013-06-05Put analyze_keyword back in explain_option_name production.Tom Lane
In commit 2c92edad48796119c83d7dbe6c33425d1924626d, I broke "EXPLAIN (ANALYZE)" syntax, because I mistakenly thought that ANALYZE/ANALYSE were only partially reserved and thus would be included in NonReservedWord; but actually they're fully reserved so they still need to be called out here. A nicer solution would be to demote these words to type_func_name_keyword status (they can't be less than that because of "VACUUM [ANALYZE] ColId"). While that works fine so far as the core grammar is concerned, it breaks ECPG's grammar for reasons I don't have time to isolate at the moment. So do this for the time being. Per report from Kevin Grittner. Back-patch to 9.0, like the previous commit.
2013-06-04Fix memory leak in LogStandbySnapshot().Tom Lane
The array allocated by GetRunningTransactionLocks() needs to be pfree'd when we're done with it. Otherwise we leak some memory during each checkpoint, if wal_level = hot_standby. This manifests as memory bloat in the checkpointer process, or in bgwriter in versions before we made the checkpointer separate. Reported and fixed by Naoya Anzai. Back-patch to 9.0 where the issue was introduced. In passing, improve comments for GetRunningTransactionLocks(), and add an Assert that we didn't overrun the palloc'd array.
2013-06-02Allow type_func_name_keywords in some places where they weren't before.Tom Lane
This change makes type_func_name_keywords less reserved than they were before, by allowing them for role names, language names, EXPLAIN and COPY options, and SET values for GUCs; which are all places where few if any actual keywords could appear instead, so no new ambiguities are introduced. The main driver for this change is to allow "COPY ... (FORMAT BINARY)" to work without quoting the word "binary". That is an inconsistency that has been complained of repeatedly over the years (at least by Pavel Golub, Kurt Lidl, and Simon Riggs); but we hadn't thought of any non-ugly solution until now. Back-patch to 9.0 where the COPY (FORMAT BINARY) syntax was introduced.
2013-05-16Fix fd.c to preserve errno where needed.Tom Lane
PathNameOpenFile failed to ensure that the correct value of errno was returned to its caller after a failure (because it incorrectly supposed that free() can never change errno). In some cases this would result in a user-visible failure because an expected ENOENT errno was replaced with something else. Bogus EINVAL failures have been observed on OS X, for example. There were also a couple of places that could mangle an important value of errno if FDDEBUG was defined. While the usefulness of that debug support is highly debatable, we might as well make it safe to use, so add errno save/restore logic to the DO_DB macro. Per bug #8167 from Nelson Minar, diagnosed by RhodiumToad. Back-patch to all supported branches.
2013-05-13Fix handling of OID wraparound while in standalone mode.Tom Lane
If OID wraparound should occur while in standalone mode (unlikely but possible), we want to advance the counter to FirstNormalObjectId not FirstBootstrapObjectId. Otherwise, user objects might be created with OIDs in the system-reserved range. That isn't immediately harmful but it poses a risk of conflicts during future pg_upgrade operations. Noted by Andres Freund. Back-patch to all supported branches, since all of them are supported sources for pg_upgrade operations.
2013-05-10Guard against input_rows == 0 in estimate_num_groups().Tom Lane
This case doesn't normally happen, because the planner usually clamps all row estimates to at least one row; but I found that it can arise when dealing with relations excluded by constraints. Without a defense, estimate_num_groups() can return zero, which leads to divisions by zero inside the planner as well as assertion failures in the executor. An alternative fix would be to change set_dummy_rel_pathlist() to make the size estimate for a dummy relation 1 row instead of 0, but that seemed pretty ugly; and probably someday we'll want to drop the convention that the minimum rowcount estimate is 1 row. Back-patch to 8.4, as the problem can be demonstrated that far back.
2013-05-02Fix thinko in comment.Heikki Linnakangas
WAL segment means a 16 MB physical WAL file; this comment meant a logical 4 GB log file. Amit Langote. Apply to backbranches only, as the comment is gone in master.
2013-04-29Ensure ANALYZE phase is not skipped because of canceled truncate.Kevin Grittner
Patch b19e4250b45e91c9cbdd18d35ea6391ab5961c8d attempted to preserve existing behavior regarding statistics generation in the case that a truncation attempt was canceled due to lock conflicts. It failed to do this accurately in two regards: (1) autovacuum had previously generated statistics if the truncate attempt failed to initially get the lock rather than having started the attempt, and (2) the VACUUM ANALYZE command had always generated statistics. Both of these changes were unintended, and are reverted by this patch. On review, there seems to be consensus that the previous failure to generate statistics when the truncate was terminated was more an unfortunate consequence of how that effort was previously terminated than a feature we want to keep; so this patch generates statistics even when an autovacuum truncation attempt terminates early. Another unintended change which is kept on the basis that it is an improvement is that when a VACUUM command is truncating, it will the new heuristic for avoiding blocking other processes, rather than keeping an AccessExclusiveLock on the table for however long the truncation takes. Per multiple reports, with some renaming per patch by Jeff Janes. Backpatch to 9.0, where problem was created.
2013-04-25Avoid deadlock between concurrent CREATE INDEX CONCURRENTLY commands.Tom Lane
There was a high probability of two or more concurrent C.I.C. commands deadlocking just before completion, because each would wait for the others to release their reference snapshots. Fix by releasing the snapshot before waiting for other snapshots to go away. Per report from Paul Hinze. Back-patch to all active branches.
2013-04-25Fix typo in comment.Heikki Linnakangas
Peter Geoghegan
2013-04-20Fix longstanding race condition in plancache.c.Tom Lane
When creating or manipulating a cached plan for a transaction control command (particularly ROLLBACK), we must not perform any catalog accesses, since we might be in an aborted transaction. However, plancache.c busily saved or examined the search_path for every cached plan. If we were unlucky enough to do this at a moment where the path's expansion into schema OIDs wasn't already cached, we'd do some catalog accesses; and with some more bad luck such as an ill-timed signal arrival, that could lead to crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya. Fortunately, there's no real need to consider the search path for such commands, so we can just skip the relevant steps when the subject statement is a TransactionStmt. This is somewhat related to bug #5269, though the failure happens during initial cached-plan creation rather than revalidation. This bug has been there since the plan cache was invented, so back-patch to all supported branches.
2013-04-04Fix crash on compiling a regular expression with more than 32k colors.Heikki Linnakangas
Throw an error instead. Backpatch to all supported branches.
2013-04-04Calculate # of semaphores correctly with --disable-spinlocks.Heikki Linnakangas
The old formula didn't take into account that each WAL sender process needs a spinlock. We had also already exceeded the fixed number of spinlocks reserved for misc purposes (10). Bump that to 30. Backpatch to 9.0, where WAL senders were introduced. If I counted correctly, 9.0 had exactly 10 predefined spinlocks, and 9.1 exceeded that, but bump the limit in 9.0 too because 10 is uncomfortably close to the edge.
2013-04-01Fix insecure parsing of server command-line switches.Tom Lane
An oversight in commit e710b65c1c56ca7b91f662c63d37ff2e72862a94 allowed database names beginning with "-" to be treated as though they were secure command-line switches; and this switch processing occurs before client authentication, so that even an unprivileged remote attacker could exploit the bug, needing only connectivity to the postmaster's port. Assorted exploits for this are possible, some requiring a valid database login, some not. The worst known problem is that the "-r" switch can be invoked to redirect the process's stderr output, so that subsequent error messages will be appended to any file the server can write. This can for example be used to corrupt the server's configuration files, so that it will fail when next restarted. Complete destruction of database tables is also possible. Fix by keeping the database name extracted from a startup packet fully separate from command-line switches, as had already been done with the user name field. The Postgres project thanks Mitsumasa Kondo for discovering this bug, Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing the full extent of the danger. Security: CVE-2013-1899
2013-03-31Translation updatesAlvaro Herrera
2013-03-27Reset OpenSSL randomness state in each postmaster child process.Tom Lane
Previously, if the postmaster initialized OpenSSL's PRNG (which it will do when ssl=on in postgresql.conf), the same pseudo-random state would be inherited by each forked child process. The problem is masked to a considerable extent if the incoming connection uses SSL encryption, but when it does not, identical pseudo-random state is made available to functions like contrib/pgcrypto. The process's PID does get mixed into any requested random output, but on most systems that still only results in 32K or so distinct random sequences available across all Postgres sessions. This might allow an attacker who has database access to guess the results of "secure" operations happening in another session. To fix, forcibly reset the PRNG after fork(). Each child process that has need for random numbers from OpenSSL's generator will thereby be forced to go through OpenSSL's normal initialization sequence, which should provide much greater variability of the sequences. There are other ways we might do this that would be slightly cheaper, but this approach seems the most future-proof against SSL-related code changes. This has been assigned CVE-2013-1900, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit. Back-patch to all supported branches. Marko Kreen
2013-03-27Fix buffer pin leak in heap update redo routine.Heikki Linnakangas
In a heap update, if the old and new tuple were on different pages, and the new page no longer existed (because it was subsequently truncated away by vacuum), heap_xlog_update forgot to release the pin on the old buffer. This bug was introduced by the "Fix multiple problems in WAL replay" patch, commit 3bbf668de9f1bc172371681e80a4e769b6d014c8 (on master branch). With full_page_writes=off, this triggered an "incorrect local pin count" error later in replay, if the old page was vacuumed. This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the commit that introduced this bug.
2013-03-10Fix race condition in DELETE RETURNING.Tom Lane
When RETURNING is specified, ExecDelete would return a virtual-tuple slot that could contain pointers into an already-unpinned disk buffer. Another process could change the buffer contents before we get around to using the data, resulting in garbage results or even a crash. This seems of fairly low probability, which may explain why there are no known field reports of the problem, but it's definitely possible. Fix by forcing the result slot to be "materialized" before we release pin on the disk buffer. Back-patch to 9.0; in earlier branches there is no bug because ExecProcessReturning sent the tuple to the destination immediately. Also, this is already fixed in HEAD as part of the writable-foreign-tables patch (where the fix is necessary for DELETE RETURNING to work at all with postgres_fdw).
2013-03-07Fix infinite-loop risk in fixempties() stage of regex compilation.Tom Lane
The previous coding of this function could get into situations where it would never terminate, because successive passes would re-add EMPTY arcs that had been removed by the previous pass. Rewrite the function completely using a new algorithm that is guaranteed to terminate, and also seems to be usually faster than the old one. Per Tcl bugs 3604074 and 3606683. Tom Lane and Don Porter
2013-03-05Fix to_char() to use ASCII-only case-folding rules where appropriate.Tom Lane
formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.
2013-03-04Fix overflow check in tm2timestamp (this time for sure).Tom Lane
I fixed this code back in commit 841b4a2d5, but didn't think carefully enough about the behavior near zero, which meant it improperly rejected 1999-12-31 24:00:00. Per report from Magnus Hagander.
2013-02-27Add missing error check in regexp parser.Tom Lane
parseqatom() failed to check for an error return (NULL result) from its recursive call to parsebranch(), and in consequence could crash with a null-pointer dereference after an error return. This bug has been there since day one, but wasn't noticed before, probably because most error cases in parsebranch() didn't actually lead to returning NULL. Add the missing error check, and also tweak parsebranch() to exit in a less indirect fashion after a call to parseqatom() fails. Report by Tomasz Karlik, fix by me.
2013-02-23Correct tense in log messagePeter Eisentraut
2013-02-13Fix bogus when-to-deregister-from-listener-array logic.Tom Lane
Since a backend adds itself to the global listener array during Exec_ListenPreCommit, it's inappropriate for it to remove itself during Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure when committing a transaction that did UNLISTEN then LISTEN, since we end up not registered though we should be. (This leads to missing later notifications, or to Assert failures in assert-enabled builds.) Instead deal with deregistering at the bottom of AtCommit_Notify, when we know the final state of the listenChannels list. Also, simplify the representation of registration status by replacing the transient backendHasExecutedInitialListen flag with an amRegisteredListener flag. Per report from Greg Sabino Mullane. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.
2013-02-10Further cleanup of gistsplit.c.Tom Lane
After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
2013-02-10Remove useless picksplit-doesn't-support-secondary-split log spam.Tom Lane
This LOG message was put in over five years ago with the evident expectation that we'd make all GiST opclasses support secondary split directly. However, no such thing ever happened, and indeed the number of opclasses supporting it decreased to zero in 9.2. The reason is that improving on the default implementation isn't that easy --- the opclass-specific code that did exist, before 9.2, doesn't appear to have been any improvement over the default. Hence, remove the message altogether. There's certainly no point in nagging users about this in released branches, but I doubt that we'll ever implement complete opclass-specific support anyway.
2013-02-10Document and clean up gistsplit.c.Tom Lane
Improve comments, rename some variables and functions, slightly simplify a couple of APIs, in an attempt to make this code readable by people other than its original author. Even though this is essentially just cosmetic, back-patch to all active branches, because otherwise it's going to make back-patching future fixes in this file very painful.
2013-02-08Fix gist_box_same and gist_point_consistent to handle fuzziness correctly.Tom Lane
While there's considerable doubt that we want fuzzy behavior in the geometric operators at all (let alone as currently implemented), nobody is stepping forward to redesign that stuff. In the meantime it behooves us to make sure that index searches agree with the behavior of the underlying operators. This patch fixes two problems in this area. First, gist_box_same was using fuzzy equality, but it really needs to use exact equality to prevent not-quite-identical upper index keys from being treated as identical, which for example would prevent an existing upper key from being extended by an amount less than epsilon. This would result in inconsistent indexes. (The next release notes will need to recommend that users reindex GiST indexes on boxes, polygons, circles, and points, since all four opclasses use gist_box_same.) Second, gist_point_consistent used exact comparisons for upper-page comparisons in ~= searches, when it needs to use fuzzy comparisons to ensure it finds all matches; and it used fuzzy comparisons for point <@ box searches, when it needs to use exact comparisons because that's what the <@ operator (rather inconsistently) does. The added regression test cases illustrate all three misbehaviors. Back-patch to all active branches. (8.4 did not have GiST point_ops, but it still seems prudent to apply the gist_box_same patch to it.) Alexander Korotkov, reviewed by Noah Misch
2013-02-07Repair bugs in GiST page splitting code for multi-column indexes.Tom Lane
When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
2013-02-07Fix possible failure to send final transaction counts to stats collector.Tom Lane
Normally, we suppress sending a tabstats message to the collector unless there were some actual table stats to send. However, during backend exit we should force out the message if there are any transaction commit/abort counts to send, else the session's last few commit/abort counts will never get reported at all. We had logic for this, but the short-circuit test at the top of pgstat_report_stat() ignored the "force" flag, with the consequence that session-ending transactions that touched no database-local tables would not get counted. Seems to be an oversight in my commit 641912b4d17fd214a5e5bae4e7bb9ddbc28b144b, which added the "force" flag. That was back in 8.3, so back-patch to all supported versions.
2013-02-04Prevent execution of enum_recv() from SQL.Tom Lane
This function was misdeclared to take cstring when it should take internal. This at least allows crashing the server, and in principle an attacker might be able to use the function to examine the contents of server memory. The correct fix is to adjust the system catalog contents (and fix the regression tests that should have caught this but failed to). However, asking users to correct the catalog contents in existing installations is a pain, so as a band-aid fix for the back branches, install a check in enum_recv() to make it throw error if called with a cstring argument. We will later revert this in HEAD in favor of correcting the catalogs. Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue. Security: CVE-2013-0255
2013-02-04Reset vacuum_defer_cleanup_age to PGC_SIGHUP.Simon Riggs
Revert commit 84725aa5efe11688633b553e58113efce4181f2e
2013-02-03Translation updatesPeter Eisentraut
2013-02-02Mark vacuum_defer_cleanup_age as PGC_POSTMASTER.Simon Riggs
Following bug analysis of #7819 by Tom Lane
2013-02-01Fix typo in freeze_table_age implementationAlvaro Herrera
The original code used freeze_min_age instead of freeze_table_age. The main consequence of this mistake is that lowering freeze_min_age would cause full-table scans to occur much more frequently, which causes serious issues because the number of writes required is much larger. That feature (freeze_min_age) is supposed to affect only how soon tuples are frozen; some pages should still be skipped due to the visibility map. Backpatch to 8.4, where the freeze_table_age feature was introduced. Report and patch from Andres Freund
2013-01-30Fix grammar for subscripting or field selection from a sub-SELECT result.Tom Lane
Such cases should work, but the grammar failed to accept them because of our ancient precedence hacks to convince bison that extra parentheses around a sub-SELECT in an expression are unambiguous. (Formally, they *are* ambiguous, but we don't especially care whether they're treated as part of the sub-SELECT or part of the expression. Bison cares, though.) Fix by adding a redundant-looking production for this case. This is a fine example of why fixing shift/reduce conflicts via precedence declarations is more dangerous than it looks: you can easily cause the parser to reject cases that should work. This has been wrong since commit 3db4056e22b0c6b2adc92543baf8408d2894fe91 or maybe before, and apparently some people have been working around it by inserting no-op casts. That method introduces a dump/reload hazard, as illustrated in bug #7838 from Jan Mate. Hence, back-patch to all active branches.
2013-01-28DROP OWNED: don't try to drop tablespaces/databasesAlvaro Herrera
My "fix" for bugs #7578 and #6116 on DROP OWNED at fe3b5eb08a1 not only misstated that it applied to REASSIGN OWNED (which it did not affect), but it also failed to fix the problems fully, because I didn't test the case of owned shared objects. Thus I created a new bug, reported by Thomas Kellerer as #7748, which would cause DROP OWNED to fail with a not-for-user-consumption error message. The code would attempt to drop the database, which not only fails to work because the underlying code does not support that, but is a pretty dangerous and undesirable thing to be doing as well. This patch fixes that bug by having DROP OWNED only attempt to process shared objects when grants on them are found, ignoring ownership. Backpatch to 8.3, which is as far as the previous bug was backpatched.
2013-01-24Fix SPI documentation for new handling of ExecutorRun's count parameter.Tom Lane
Since 9.0, the count parameter has only limited the number of tuples actually returned by the executor. It doesn't affect the behavior of INSERT/UPDATE/DELETE unless RETURNING is specified, because without RETURNING, the ModifyTable plan node doesn't return control to execMain.c for each tuple. And we only check the limit at the top level. While this behavioral change was unintentional at the time, discussion of bug #6572 led us to the conclusion that we prefer the new behavior anyway, and so we should just adjust the docs to match rather than change the code. Accordingly, do that. Back-patch as far as 9.0 so that the docs match the code in each branch.
2013-01-24Fix rare missing cancellations in Hot Standby.Simon Riggs
The machinery around XLOG_HEAP2_CLEANUP_INFO failed to correctly pass through the necessary information on latestRemovedXid, avoiding cancellations in some infrequent concurrent update/cleanup scenarios. Backpatchable fix to 9.0 Detailed bug report and fix by Noah Misch, backpatchable version by me.
2013-01-23Fix performance problems with autovacuum truncation in busy workloads.Kevin Grittner
In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. Backpatched to 9.0 from initial master commit at b19e4250b45e91c9cbdd18d35ea6391ab5961c8d -- before that the differences are too large to be clearly safe. Jan Wieck
2013-01-18Protect against SnapshotNow race conditions in pg_tablespace scans.Tom Lane
Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane
2013-01-14Reject out-of-range dates in to_date().Tom Lane
Dates outside the supported range could be entered, but would not print reasonably, and operations such as conversion to timestamp wouldn't behave sanely either. Since this has the potential to result in undumpable table data, it seems worth back-patching. Hitoshi Harada
2012-12-23Prevent failure when RowExpr or XmlExpr is parse-analyzed twice.Tom Lane
transformExpr() is required to cope with already-transformed expression trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However, some of its newer subroutines hadn't gotten the memo. This accounts for bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING INDEXES. Additional investigation showed that transformXmlExpr had the same kind of problem, but all the other cases seem to be safe. Andres Freund and Tom Lane