summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2012-06-07Wake WALSender to reduce data loss at failover for async commit.Simon Riggs
WALSender now woken up after each background flush by WALwriter, avoiding multi-second replication delay for an all-async commit workload. Replication delay reduced from 7s with default settings to 200ms, allowing significantly reduced data loss at failover. Andres Freund and Simon Riggs
2012-06-01Avoid early reuse of btree pages, causing incorrect query results.Simon Riggs
When we allowed read-only transactions to skip assigning XIDs we introduced the possibility that a fully deleted btree page could be reused. This broke the index link sequence which could then lead to indexscans silently returning fewer rows than would have been correct. The actual incidence of silent errors from this is thought to be very low because of the exact workload required and locking pre-conditions. Fix is to remove pages only if index page opaque->btpo.xact precedes RecentGlobalXmin. Noah Misch, reviewed by Simon Riggs
2012-05-31Stamp 9.0.8.REL9_0_8Tom Lane
2012-05-31Translation updatesPeter Eisentraut
2012-05-31Revert back-branch changes in behavior of age(xid).Tom Lane
Per discussion, it does not seem like a good idea to change the behavior of age(xid) in a minor release, even though the old definition causes the function to fail on hot standby slaves. Therefore, revert commit 5829387381d2e4edf84652bb5a712f6185860670 and follow-on commits in the back branches only.
2012-05-31Update time zone data files to tzdata release 2012c.Tom Lane
DST law changes in Antarctica, Armenia, Chile, Cuba, Falkland Islands, Gaza, Haiti, Hebron, Morocco, Syria, Tokelau Islands. Historical corrections for Canada.
2012-05-30Ignore SECURITY DEFINER and SET attributes for a PL's call handler.Tom Lane
It's not very sensible to set such attributes on a handler function; but if one were to do so, fmgr.c went into infinite recursion because it would call fmgr_security_definer instead of the handler function proper. There is no way for fmgr_security_definer to know that it ought to call the handler and not the original function referenced by the FmgrInfo's fn_oid, so it tries to do the latter, causing the whole process to start over again. Ordinarily such misconfiguration of a procedural language's handler could be written off as superuser error. However, because we allow non-superuser database owners to create procedural languages and the handler for such a language becomes owned by the database owner, it is possible for a database owner to crash the backend, which ideally shouldn't be possible without superuser privileges. In 9.2 and up we will adjust things so that the handler functions are always owned by superusers, but in existing branches this is a minor security fix. Problem noted by Noah Misch (after several of us had failed to detect it :-(). This is CVE-2012-2655.
2012-05-30Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich.Tom Lane
We used to only allow offsets less than +/-13 hours, then it was +/14, then it was +/-15. That's still not good enough though, as per today's bug report from Patric Bechtel. This time I actually looked through the Olson timezone database to find the largest offsets used anywhere. The winners are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at +15:13:42 until 1867. So we'd better allow offsets less than +/-16 hours. Given the history, we are way overdue to have some greppable #define symbols controlling this, so make some ... and also remove an obsolete comment that didn't get fixed the last time. Back-patch to all supported branches.
2012-05-28Teach AbortOutOfAnyTransaction to clean up partially-started transactions.Tom Lane
AbortOutOfAnyTransaction failed to do anything if the state it saw on entry corresponded to failing partway through StartTransaction. I fixed AbortCurrentTransaction to cope with that case way back in commit 60b2444cc3ba037630c9b940c3c9ef01b954b87b, but evidently overlooked that AbortOutOfAnyTransaction should do likewise. Back-patch to all supported branches. It's not clear that this omission has any more-than-cosmetic consequences, but it's also not clear that it doesn't, so back-patching seems the least risky choice.
2012-05-26Prevent synchronized scanning when systable_beginscan chooses a heapscan.Tom Lane
The only interesting-for-performance case wherein we force heapscan here is when we're rebuilding the relcache init file, and the only such case that is likely to be examining a catalog big enough to be syncscanned is RelationBuildTupleDesc. But the early-exit optimization in that code gets broken if we start the scan at a random place within the catalog, so that allowing syncscan is actually a big deoptimization if pg_attribute is large (at least for the normal case where the rows for core system catalogs have never been changed since initdb). Hence, prevent syncscan here. Per my testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane, though neither of them seem to have actually hit this specific problem. Back-patch to 8.3, where syncscan was introduced.
2012-05-25Fix string truncation to be multibyte-aware in text_name and bpchar_name.Tom Lane
Previously, casts to name could generate invalidly-encoded results. Also, make these functions match namein() more exactly, by consistently using palloc0() instead of ad-hoc zeroing code. Back-patch to all supported branches. Karl Schnaitter and Tom Lane
2012-05-25Use binary search instead of brute-force scan in findNamespace().Tom Lane
The previous coding presented a significant bottleneck when dumping databases containing many thousands of schemas, since the total time spent searching would increase roughly as O(N^2) in the number of objects. Noted by Jeff Janes, though I rewrote his proposed patch to use the existing findObjectByOid infrastructure. Since this is a longstanding performance bug, backpatch to all supported versions.
2012-05-22Ensure that seqscans check for interrupts at least once per page.Tom Lane
If a seqscan encounters many consecutive pages containing only dead tuples, it can remain in the loop in heapgettup for a long time, and there was no CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were real-world situations where a query would be effectively uncancelable for long stretches. Add a check placed to occur once per page, which should be enough to provide reasonable response time without adding any measurable overhead. Report and patch by Merlin Moncure (though I tweaked it a bit). Back-patch to all supported branches.
2012-05-15Fix bug in to_tsquery().Heikki Linnakangas
We were using memcpy() to copy to a possibly overlapping memory region, which is a no-no. Use memmove() instead.
2012-05-13Fix DROP TABLESPACE to unlink symlink when directory is not there.Tom Lane
If the tablespace directory is missing entirely, we allow DROP TABLESPACE to go through, on the grounds that it should be possible to clean up the catalog entry in such a situation. However, we forgot that the pg_tblspc symlink might still be there. We should try to remove the symlink too (but not fail if it's no longer there), since not doing so can lead to weird behavior subsequently, as per report from Michael Nolan. There was some discussion of adding dependency links to prevent DROP TABLESPACE when the catalogs still contain references to the tablespace. That might be worth doing too, but it's an orthogonal question, and in any case wouldn't be back-patchable. Back-patch to 9.0, which is as far back as the logic looks like this. We could possibly do something similar in 8.x, but given the lack of reports I'm not sure it's worth the trouble, and anyway the case could not arise in the form the logic is meant to cover (namely, a post-DROP transaction rollback having resurrected the pg_tablespace entry after some or all of the filesystem infrastructure is gone).
2012-05-12Ensure backwards compatibility for GetStableLatestTransactionId()Simon Riggs
2012-05-11Remove extraneous #include "storage/proc.h"Simon Riggs
2012-05-11Ensure age() returns a stable value rather than the latest valueSimon Riggs
2012-05-10Fix Windows implementation of PGSemaphoreLock.Tom Lane
The original coding failed to reset ImmediateInterruptOK before returning, which would potentially allow a subsequent query-cancel interrupt to be accepted at an unsafe point. This is a really nasty bug since it's so hard to predict the consequences, but they could be unpleasant. Also, ensure that signal handlers are serviced before this function returns, even if the semaphore is already set. This should make the behavior more like Unix. Back-patch to all supported versions.
2012-05-09PL/pgSQL RETURN NEXT was leaking converted tuples, causingJoe Conway
out of memory when looping through large numbers of rows. Flag the converted tuples to be freed. Complaint and patch by Joe.
2012-05-09Avoid xid error from age() function when run on Hot StandbySimon Riggs
2012-04-27Fix printing of whole-row Vars at top level of a SELECT targetlist.Tom Lane
Normally whole-row Vars are printed as "tabname.*". However, that does not work at top level of a targetlist, because per SQL standard the parser will think that the "*" should result in column-by-column expansion; which is not at all what a whole-row Var implies. We used to just print the table name in such cases, which works most of the time; but it fails if the table name matches a column name available anywhere in the FROM clause. This could lead for instance to a view being interpreted differently after dump and reload. Adding parentheses doesn't fix it, but there is a reasonably simple kluge we can use instead: attach a no-op cast, so that the "*" isn't syntactically at top level anymore. This makes the printing of such whole-row Vars a lot more consistent with other Vars, and may indeed fix more cases than just the reported one; I'm suspicious that cases involving schema qualification probably didn't work properly before, either. Per bug report and fix proposal from Abbas Butt, though this patch is quite different in detail from his. Back-patch to all supported versions.
2012-04-27Fix syslogger's rotation disable/re-enable logic.Tom Lane
If it fails to open a new log file, the syslogger assumes there's something wrong with its parameters (such as log_directory), and stops attempting automatic time-based or size-based log file rotations. Sending it SIGHUP is supposed to start that up again. However, the original coding for that was really bogus, involving clobbering a couple of GUC variables and hoping that SIGHUP processing would restore them. Get rid of that technique in favor of maintaining a separate flag showing we've turned rotation off. Per report from Mark Kirkwood. Also, the syslogger will automatically attempt to create the log_directory directory if it doesn't exist, but that was only happening at startup. For consistency and ease of use, it should do the same whenever the value of log_directory is changed by SIGHUP. Back-patch to all supported branches.
2012-04-25Fix edge-case behavior of pg_next_dst_boundary().Tom Lane
Due to rather sloppy thinking (on my part, I'm afraid) about the appropriate behavior for boundary conditions, pg_next_dst_boundary() gave undefined, platform-dependent results when the input time is exactly the last recorded DST transition time for the specified time zone, as a result of fetching values one past the end of its data arrays. Change its specification to be that it always finds the next DST boundary *after* the input time, and adjust code to match that. The sole existing caller, DetermineTimeZoneOffset, doesn't actually care about this distinction, since it always uses a probe time earlier than the instant that it does care about. So it seemed best to me to change the API to make the result=1 and result=0 cases more consistent, specifically to ensure that the "before" outputs always describe the state at the given time, rather than hacking the code to obey the previous API comment exactly. Per bug #6605 from Sergey Burladyan. Back-patch to all supported versions.
2012-04-18Revert recent commit re positional arguments.Andrew Dunstan
2012-04-18Fix copyfuncs/equalfuncs support for ReassignOwnedStmt.Robert Haas
Noah Misch
2012-04-17Don't override arguments set via options with positional arguments.Andrew Dunstan
A number of utility programs were rather careless about paremeters that can be set via both an option argument and a positional argument. This leads to results which can violate the Principal Of Least Astonishment. These changes refuse to use positional arguments to override settings that have been made via positional arguments. The changes are backpatched to all live branches.
2012-04-11Clamp indexscan filter condition cost estimate to be not less than zero.Tom Lane
cost_index tries to estimate the per-tuple costs of evaluating filter conditions (a/k/a qpquals) by subtracting the estimated cost of the indexqual conditions from that of the baserestrictinfo conditions. This is correct so long as the indexquals list is a subset of the baserestrictinfo list. However, in the presence of derived indexable conditions it's completely wrong, leading to bogus or even negative scan cost estimates, as seen for example in bug #6579 from Istvan Endredy. In practice the problem isn't severe except in the specific case of a LIKE optimization on a functional index containing a very expensive function. A proper fix for this might change cost estimates by more than people would like for stable branches, so in the back branches let's just clamp the cost difference to be not less than zero. That will at least prevent completely insane behavior, while not changing the results normally.
2012-04-09Fix an Assert that turns out to be reachable after all.Tom Lane
estimate_num_groups() gets unhappy with create table empty(); select * from empty except select * from empty e2; I can't see any actual use-case for such a query (and the table is illegal per SQL spec), but it seems like a good idea that it not cause an assert failure.
2012-04-08set_stack_base() no longer needs to be called in PostgresMain.Heikki Linnakangas
This was a thinko in previous commit. Now that stack base pointer is now set in PostmasterMain and SubPostmasterMain, it doesn't need to be set in PostgresMain anymore.
2012-04-08Do stack-depth checking in all postmaster children.Heikki Linnakangas
We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.
2012-04-06Fix misleading output from gin_desc().Tom Lane
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.
2012-04-04Fix syslogger to not lose log coherency under high load.Tom Lane
The original coding of the syslogger had an arbitrary limit of 20 large messages concurrently in progress, after which it would just punt and dump message fragments to the output file separately. Our ambitions are a bit higher than that now, so allow the data structure to expand as necessary. Reported and patched by Andrew Dunstan; some editing by Tom
2012-03-31Fix O(N^2) behavior in pg_dump when many objects are in dependency loops.Tom Lane
Combining the loop workspace with the record of already-processed objects might have been a cute trick, but it behaves horridly if there are many dependency loops to repair: the time spent in the first step of findLoop() grows as O(N^2). Instead use a separate flag array indexed by dump ID, which we can check in constant time. The length of the workspace array is now never more than the actual length of a dependency chain, which should be reasonably short in all cases of practical interest. The code is noticeably easier to understand this way, too. Per gripe from Mike Roest. Since this is a longstanding performance bug, backpatch to all supported versions.
2012-03-31Fix O(N^2) behavior in pg_dump for large numbers of owned sequences.Tom Lane
The loop that matched owned sequences to their owning tables required time proportional to number of owned sequences times number of tables; although this work was only expended in selective-dump situations, which is probably why the issue wasn't recognized long since. Refactor slightly so that we can perform this work after the index array for findTableByOid has been set up, reducing the time to O(M log N). Per gripe from Mike Roest. Since this is a longstanding performance bug, backpatch to all supported versions.
2012-03-29Correct epoch of txid_current() when executed on a Hot Standby server.Simon Riggs
Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina
2012-03-25Fix COPY FROM for null marker strings that correspond to invalid encoding.Tom Lane
The COPY documentation says "COPY FROM matches the input against the null string before removing backslashes". It is therefore reasonable to presume that null markers like E'\\0' will work ... and they did, until someone put the tests in the wrong order during microoptimization-driven rewrites. Since then, we've been failing if the null marker is something that would de-escape to an invalidly-encoded string. Since null markers generally need to be something that can't appear in the data, this represents a nontrivial loss of functionality; surprising nobody noticed it earlier. Per report from Jeff Davis. Backpatch to 8.4 where this got broken.
2012-03-24Fix planner's handling of outer PlaceHolderVars within subqueries.Tom Lane
For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.
2012-03-22Fix GET DIAGNOSTICS for case of assignment to function's first variable.Tom Lane
An incorrect and entirely unnecessary "safety check" in exec_stmt_getdiag() caused the code to treat an assignment to a variable with dno zero as a no-op. Unfortunately, that's a perfectly valid dno. This has been broken since GET DIAGNOSTICS was invented. It's not terribly surprising that the bug went unnoticed for so long, since in most cases you probably wouldn't use the function's first-created variable (normally its first parameter) as a GET DIAGNOSTICS target. Nonetheless, it's broken. Per bug #6551 from Adam Buraczewski.
2012-03-21Don't allow CREATE TABLE AS to put relations in pg_global.Robert Haas
This was never intended to be allowed, and is blocked for an ordinary CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This commit won't do anything to fix existing cases where this has loophole has been exploited, but it still seems prudent to lock it down going forward. Back-branch commit only, as this problem has been refactored away on the master branch. Andres Freund
2012-03-17Honor inputdir and outputdir when converting regression files.Andrew Dunstan
When converting source files, pg_regress' inputdir and outputdir options were ignored when computing the locations of the destination files. In consequence, these options were effectively unusable when the regression inputs need to be adjusted by pg_regress. This patch makes pg_regress put the converted files in the same place that these options specify non-converted input or results files are to be found. Backpatched to all live branches.
2012-03-11ecpg: Fix off-by-one error in memory copyingPeter Eisentraut
In a rare case, one byte past the end of memory belonging to the sqlca_t structure would be written to. found by Coverity
2012-03-11ecpg: Fix rare memory leaksPeter Eisentraut
found by Coverity
2012-03-11psql: Fix invalid memory accessPeter Eisentraut
Due to an apparent thinko, when printing a table in expanded mode (\x), space would be allocated for 1 slot plus 1 byte per line, instead of 1 slot per line plus 1 slot for the NULL terminator. When the line count is small, reading or writing the terminator would therefore access memory beyond what was allocated.
2012-02-26Fix some more bugs in GIN's WAL replay logic.Tom Lane
In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.
2012-02-23Stamp 9.0.7.REL9_0_7Tom Lane
2012-02-23Convert newlines to spaces in names written in pg_dump comments.Tom Lane
pg_dump was incautious about sanitizing object names that are emitted within SQL comments in its output script. A name containing a newline would at least render the script syntactically incorrect. Maliciously crafted object names could present a SQL injection risk when the script is reloaded. Reported by Heikki Linnakangas, patch by Robert Haas Security: CVE-2012-0868
2012-02-23Remove arbitrary limitation on length of common name in SSL certificates.Tom Lane
Both libpq and the backend would truncate a common name extracted from a certificate at 32 bytes. Replace that fixed-size buffer with dynamically allocated string so that there is no hard limit. While at it, remove the code for extracting peer_dn, which we weren't using for anything; and don't bother to store peer_cn longer than we need it in libpq. This limit was not so terribly unreasonable when the code was written, because we weren't using the result for anything critical, just logging it. But now that there are options for checking the common name against the server host name (in libpq) or using it as the user's name (in the server), this could result in undesirable failures. In the worst case it even seems possible to spoof a server name or user name, if the correct name is exactly 32 bytes and the attacker can persuade a trusted CA to issue a certificate in which that string is a prefix of the certificate's common name. (To exploit this for a server name, he'd also have to send the connection astray via phony DNS data or some such.) The case that this is a realistic security threat is a bit thin, but nonetheless we'll treat it as one. Back-patch to 8.4. Older releases contain the faulty code, but it's not a security problem because the common name wasn't used for anything interesting. Reported and patched by Heikki Linnakangas Security: CVE-2012-0867
2012-02-23Require execute permission on the trigger function for CREATE TRIGGER.Tom Lane
This check was overlooked when we added function execute permissions to the system years ago. For an ordinary trigger function it's not a big deal, since trigger functions execute with the permissions of the table owner, so they couldn't do anything the user issuing the CREATE TRIGGER couldn't have done anyway. However, if a trigger function is SECURITY DEFINER, that is not the case. The lack of checking would allow another user to install it on his own table and then invoke it with, essentially, forged input data; which the trigger function is unlikely to realize, so it might do something undesirable, for instance insert false entries in an audit log table. Reported by Dinesh Kumar, patch by Robert Haas Security: CVE-2012-0866
2012-02-23Translation updatesPeter Eisentraut