summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-06-27Avoid hot standby cancels from VAC FREEZESimon Riggs
VACUUM FREEZE generated false cancelations of standby queries on an otherwise idle master. Caused by an off-by-one error on cutoff_xid which goes back to original commit. Backpatch to all versions 9.0+ Analysis and report by Marco Nenciarini Bug fix by Simon Riggs
2015-06-25Allow background workers to connect to no particular database.Robert Haas
The documentation claims that this is supported, but it didn't actually work. Fix that. Reported by Pavel Stehule; patch by me.
2015-06-25Fix the logic for putting relations into the relcache init file.Tom Lane
Commit f3b5565dd4e59576be4c772da364704863e6a835 was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.
2015-06-22Improve inheritance_planner()'s performance for large inheritance sets.Tom Lane
Commit c03ad5602f529787968fa3201b35c119bbc6d782 introduced a planner performance regression for UPDATE/DELETE on large inheritance sets. It required copying the append_rel_list (which is of size proportional to the number of inherited tables) once for each inherited table, thus resulting in O(N^2) time and memory consumption. While it's difficult to avoid that in general, the extra work only has to be done for append_rel_list entries that actually reference subquery RTEs, which inheritance-set entries will not. So we can buy back essentially all of the loss in cases without subqueries in FROM; and even for those, the added work is mainly proportional to the number of UNION ALL subqueries. Back-patch to 9.2, like the previous commit. Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
2015-06-21Truncate strings in tarCreateHeader() with strlcpy(), not sprintf().Noah Misch
This supplements the GNU libc bug #6530 workarounds introduced in commit 54cd4f04576833abc394e131288bf3dd7dcf4806. On affected systems, a tar-format pg_basebackup failed when some filename beneath the data directory was not valid character data in the postmaster/walsender locale. Back-patch to 9.1, where pg_basebackup was introduced. Extant, bug-prone conversion specifications receive only ASCII bytes or involve low-importance messages.
2015-06-21Improve multixact emergency autovacuum logic.Andres Freund
Previously autovacuum was not necessarily triggered if space in the members slru got tight. The first problem was that the signalling was tied to values in the offsets slru, but members can advance much faster. Thats especially a problem if old sessions had been around that previously prevented the multixact horizon to increase. Secondly the skipping logic doesn't work if the database was restarted after autovacuum was triggered - that knowledge is not preserved across restart. This is especially a problem because it's a common panic-reaction to restart the database if it gets slow to anti-wraparound vacuums. Fix the first problem by separating the logic for members from offsets. Trigger autovacuum whenever a multixact crosses a segment boundary, as the current member offset increases in irregular values, so we can't use a simple modulo logic as for offsets. Add a stopgap for the second problem, by signalling autovacuum whenver ERRORing out because of boundaries. Discussion: 20150608163707.GD20772@alap3.anarazel.de Backpatch into 9.3, where it became more likely that multixacts wrap around.
2015-06-20Fix thinko in comment (launcher -> worker)Alvaro Herrera
2015-06-19Clamp autovacuum launcher sleep time to 5 minutesAlvaro Herrera
This avoids the problem that it might go to sleep for an unreasonable amount of time in unusual conditions like the server clock moving backwards an unreasonable amount of time. (Simply moving the server clock forward again doesn't solve the problem unless you wake up the autovacuum launcher manually, say by sending it SIGHUP). Per trouble report from Prakash Itnal in https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com Analyzed independently by Haribabu Kommi and Tom Lane.
2015-06-19Fix corner case in autovacuum-forcing logic for multixact wraparound.Robert Haas
Since find_multixact_start() relies on SimpleLruDoesPhysicalPageExist(), and that function looks only at the on-disk state, it's possible for it to fail to find a page that exists in the in-memory SLRU that has not been written yet. If that happens, SetOffsetVacuumLimit() will erroneously decide to force emergency autovacuuming immediately. We should probably fix find_multixact_start() to consider the data cached in memory as well as on the on-disk state, but that's no excuse for SetOffsetVacuumLimit() to be stupid about the case where it can no longer read the value after having previously succeeded in doing so. Report by Andres Freund.
2015-06-15Check for out of memory when allocating sqlca.Michael Meskes
Patch by Michael Paquier
2015-06-15Fix memory leak in ecpglib's connect function.Michael Meskes
Patch by Michael Paquier
2015-06-13Fixed some memory leaks in ECPG.Michael Meskes
Patch by Michael Paquier Conflicts: src/interfaces/ecpg/preproc/variable.c
2015-06-13Fix intoasc() in Informix compat lib. This function used to be a noop.Michael Meskes
Patch by Michael Paquier
2015-06-12Improve error message and hint for ALTER COLUMN TYPE can't-cast failure.Tom Lane
We already tried to improve this once, but the "improved" text was rather off-target if you had provided a USING clause. Also, it seems helpful to provide the exact text of a suggested USING clause, so users can just copy-and-paste it when needed. Per complaint from Keith Rarick and a suggestion from Merlin Moncure. Back-patch to 9.2 where the current wording was adopted.
2015-06-09Stamp 9.3.9.REL9_3_9Tom Lane
2015-06-09Report more information if pg_perm_setlocale() fails at startup.Tom Lane
We don't know why a few Windows users have seen this fail, but the taciturnity of the error message certainly isn't helping debug it. Let's at least find out which LC category isn't working.
2015-06-08Allow HotStandbyActiveInReplay() to be called in single user mode.Andres Freund
HotStandbyActiveInReplay, introduced in 061b079f, only allowed WAL replay to happen in the startup process, missing the single user case. This buglet is fairly harmless as it only causes problems when single user mode in an assertion enabled build is used to replay a btree vacuum record. Backpatch to 9.2. 061b079f was backpatched further, but the assertion was not.
2015-06-07Use a safer method for determining whether relcache init file is stale.Tom Lane
When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-05Fix incorrect order of database-locking operations in InitPostgres().Tom Lane
We should set MyProc->databaseId after acquiring the per-database lock, not beforehand. The old way risked deadlock against processes trying to copy or delete the target database, since they would first acquire the lock and then wait for processes with matching databaseId to exit; that left a window wherein an incoming process could set its databaseId and then block on the lock, while the other process had the lock and waited in vain for the incoming process to exit. CountOtherDBBackends() would time out and fail after 5 seconds, so this just resulted in an unexpected failure not a permanent lockup, but it's still annoying when it happens. A real-world example of a use-case is that short-duration connections to a template database should not cause CREATE DATABASE to fail. Doing it in the other order should be fine since the contract has always been that processes searching the ProcArray for a database ID must hold the relevant per-database lock while searching. Thus, this actually removes the former race condition that required an assumption that storing to MyProc->databaseId is atomic. It's been like this for a long time, so back-patch to all active branches.
2015-06-05Cope with possible failure of the oldest MultiXact to exist.Robert Haas
Recent commits, mainly b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and 53bb309d2d5a9432d2602c93ed18e58bd2924e15, introduced mechanisms to protect against wraparound of the MultiXact member space: the number of multixacts that can exist at one time is limited to 2^32, but the total number of members in those multixacts is also limited to 2^32, and older code did not take care to enforce the second limit, potentially allowing old data to be overwritten while it was still needed. Unfortunately, these new mechanisms failed to account for the fact that the code paths in which they run might be executed during recovery or while the cluster was in an inconsistent state. Also, they failed to account for the fact that users who used pg_upgrade to upgrade a PostgreSQL version between 9.3.0 and 9.3.4 might have might oldestMultiXid = 1 in the control file despite the true value being larger. To fix these problems, first, avoid unnecessarily examining the mmembers of MultiXacts when the cluster is not known to be consistent. TruncateMultiXact has done this for a long time, and this patch does not fix that. But the new calls used to prevent member wraparound are not needed until we reach normal running, so avoid calling them earlier. (SetMultiXactIdLimit is actually called before InRecovery is set, so we can't rely on that; we invent our own multixact-specific flag instead.) Second, make failure to look up the members of a MultiXact a non-fatal error. Instead, if we're unable to determine the member offset at which wraparound would occur, postpone arming the member wraparound defenses until we are able to do so. If we're unable to determine the member offset that should force autovacuum, force it continuously until we are able to do so. If we're unable to deterine the member offset at which we should truncate the members SLRU, log a message and skip truncation. An important consequence of these changes is that anyone who does have a bogus oldestMultiXid = 1 value in pg_control will experience immediate emergency autovacuuming when upgrading to a release that contains this fix. The release notes should highlight this fact. If a user has no pg_multixact/offsets/0000 file, but has oldestMultiXid = 1 in the control file, they may wish to vacuum any tables with relminmxid = 1 prior to upgrading in order to avoid an immediate emergency autovacuum after the upgrade. This must be done with a PostgreSQL version 9.3.5 or newer and with vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age set to 0. This patch also adds an additional log message at each database server startup, indicating either that protections against member wraparound have been engaged, or that they have not. In the latter case, once autovacuum has advanced oldestMultiXid to a sane value, the message indicating that the guards have been engaged will appear at the next checkpoint. A few additional messages have also been added at the DEBUG1 level so that the correct operation of this code can be properly audited. Along the way, this patch fixes another, related bug in TruncateMultiXact that has existed since PostgreSQL 9.3.0: when no MultiXacts exist at all, the truncation code looks up NextMultiXactId, which doesn't exist yet. This can lead to TruncateMultiXact removing every file in pg_multixact/offsets instead of keeping one around, as it should. This in turn will cause the database server to refuse to start afterwards. Patch by me. Review by Álvaro Herrera, Andres Freund, Noah Misch, and Thomas Munro.
2015-06-04pgindent run on access/transam/multixact.cAlvaro Herrera
This file has been patched over and over, and the differences to master caused by pgindent are annoying enough that it seems saner to make the older branches look the same. Backpatch to 9.3, which is as far back as backpatching of bugfixes is necessary.
2015-06-03Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.Tom Lane
When the inner side of a nestloop SEMI or ANTI join is an indexscan that uses all the join clauses as indexquals, it can be presumed that both matched and unmatched outer rows will be processed very quickly: for matched rows, we'll stop after fetching one row from the indexscan, while for unmatched rows we'll have an indexscan that finds no matching index entries, which should also be quick. The planner already knew about this, but it was nonetheless charging for at least one full run of the inner indexscan, as a consequence of concerns about the behavior of materialized inner scans --- but those concerns don't apply in the fast case. If the inner side has low cardinality (many matching rows) this could make an indexscan plan look far more expensive than it actually is. To fix, rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we don't add the inner scan cost until we've inspected the indexquals, and then we can add either the full-run cost or just the first tuple's cost as appropriate. Experimentation with this fix uncovered another problem: add_path and friends were coded to disregard cheap startup cost when considering parameterized paths. That's usually okay (and desirable, because it thins the path herd faster); but in this fast case for SEMI/ANTI joins, it could result in throwing away the desired plain indexscan path in favor of a bitmap scan path before we ever get to the join costing logic. In the many-matching-rows cases of interest here, a bitmap scan will do a lot more work than required, so this is a problem. To fix, add a per-relation flag consider_param_startup that works like the existing consider_startup flag, but applies to parameterized paths, and set it for relations that are the inside of a SEMI or ANTI join. To make this patch reasonably safe to back-patch, care has been taken to avoid changing the planner's behavior except in the very narrow case of SEMI/ANTI joins with inner indexscans. There are places in compare_path_costs_fuzzily and add_path_precheck that are not terribly consistent with the new approach, but changing them will affect planner decisions at the margins in other cases, so we'll leave that for a HEAD-only fix. Back-patch to 9.3; before that, the consider_startup flag didn't exist, meaning that the second aspect of the patch would be too invasive. Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-01Stamp 9.3.8.REL9_3_8Tom Lane
2015-05-29initdb -S should now have an explicit check that $PGDATA is valid.Tom Lane
The fsync code from the backend essentially assumes that somebody's already validated PGDATA, at least to the extent of it being a readable directory. That's safe enough for initdb's normal code path too, but "initdb -S" doesn't have any other processing at all that touches the target directory. To have reasonable error-case behavior, add a pg_check_dir call. Per gripe from Peter E.
2015-05-29Remove special cases for ETXTBSY from new fsync'ing logic.Tom Lane
The argument that this is a sufficiently-expected case to be silently ignored seems pretty thin. Andres had brought it up back when we were still considering that most fsync failures should be hard errors, and it probably would be legit not to fail hard for ETXTBSY --- but the same is true for EROFS and other cases, which is why we gave up on hard failures. ETXTBSY is surely not a normal case, so logging the failure seems fine from here.
2015-05-29Adjust initdb to also not consider fsync'ing failures fatal.Tom Lane
Make initdb's version of this logic look as much like the backend's as possible. This is much less critical than in the backend since not so many people use "initdb -S", but we want the same corner-case error handling in both cases. Back-patch to 9.3 where initdb -S option was introduced. Before that, initdb only had to deal with freshly-created data directories, wherein no failures should be expected. Abhijit Menon-Sen
2015-05-28Fix fsync-at-startup code to not treat errors as fatal.Tom Lane
Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane
2015-05-28Fix pg_get_functiondef() to print a function's LEAKPROOF property.Tom Lane
Seems to have been an oversight in the original leakproofness patch. Per report and patch from Jeevan Chalke. In passing, prettify some awkward leakproof-related code in AlterFunction.
2015-05-27Fix portability issue in isolationtester grammar.Tom Lane
specparse.y and specscanner.l used "string" as a token name. Now, bison likes to define each token name as a macro for the token code it assigns, which means those names are basically off-limits for any other use within the grammar file or included headers. So names as generic as "string" are dangerous. This is what was causing the recent failures on protosciurus: some versions of Solaris' sys/kstat.h use "string" as a field name. With late-model bison we don't see this problem because the token macros aren't defined till later (that is why castoroides didn't show the problem even though it's on the same machine). But protosciurus uses bison 1.875 which defines the token macros up front. This land mine has been there from day one; we'd have found it sooner except that protosciurus wasn't trying to run the isolation tests till recently. To fix, rename the token to "string_literal" which is hopefully less likely to collide with names used by system headers. Back-patch to all branches containing the isolation tests.
2015-05-25Update README.tuplockAlvaro Herrera
Multixact truncation is now handled differently, and this file hadn't gotten the memo. Per note from Amit Langote. I didn't use his patch, though. Also update the description of infomask bits, which weren't completely up to date either. This commit also propagates b01a4f6838 back to 9.3 and 9.4, which apparently I failed to do back then.
2015-05-24Rename pg_shdepend.c's typedef "objectType" to SharedDependencyObjectType.Tom Lane
The name objectType is widely used as a field name, and it's pure luck that this conflict has not caused pgindent to go crazy before. It messed up pg_audit.c pretty good though. Since pg_shdepend.c doesn't export this typedef and only uses it in three places, changing that seems saner than changing the field usages. Back-patch because we're contemplating using the union of all branch typedefs for future pgindent runs, so this won't fix anything if it stays the same in back branches.
2015-05-21Back-patch libpq support for TLS versions beyond v1.Tom Lane
Since 7.3.2, libpq has been coded in such a way that the only SSL protocol it would allow was TLS v1. That approach is looking increasingly obsolete. In commit 820f08cabdcbb899 we fixed it to allow TLS >= v1, but did not back-patch the change at the time, partly out of caution and partly because the question was confused by a contemporary server-side change to reject the now-obsolete SSL protocol v3. 9.4 has now been out long enough that it seems safe to assume the change is OK; hence, back-patch into 9.0-9.3. (I also chose to back-patch some relevant comments added by commit 326e1d73c476a0b5, but did *not* change the server behavior; hence, pre-9.4 servers will continue to allow SSL v3, even though no remotely modern client will request it.) Per gripe from Jan Bilek.
2015-05-19Revert error-throwing wrappers for the printf family of functions.Tom Lane
This reverts commit 16304a013432931e61e623c8d85e9fe24709d9ba, except for its changes in src/port/snprintf.c; as well as commit cac18a76bb6b08f1ecc2a85e46c9d2ab82dd9d23 which is no longer needed. Fujii Masao reported that the previous commit caused failures in psql on OS X, since if one exits the pager program early while viewing a query result, psql sees an EPIPE error from fprintf --- and the wrapper function thought that was reason to panic. (It's a bit surprising that the same does not happen on Linux.) Further discussion among the security list concluded that the risk of other such failures was far too great, and that the one-size-fits-all approach to error handling embodied in the previous patch is unlikely to be workable. This leaves us again exposed to the possibility of the type of failure envisioned in CVE-2015-3166. However, that failure mode is strictly hypothetical at this point: there is no concrete reason to believe that an attacker could trigger information disclosure through the supposed mechanism. In the first place, the attack surface is fairly limited, since so much of what the backend does with format strings goes through stringinfo.c or psprintf(), and those already had adequate defenses. In the second place, even granting that an unprivileged attacker could control the occurrence of ENOMEM with some precision, it's a stretch to believe that he could induce it just where the target buffer contains some valuable information. So we concluded that the risk of non-hypothetical problems induced by the patch greatly outweighs the security risks. We will therefore revert, and instead undertake closer analysis to identify specific calls that may need hardening, rather than attempt a universal solution. We have kept the portion of the previous patch that improved snprintf.c's handling of errors when it calls the platform's sprintf(). That seems to be an unalloyed improvement. Security: CVE-2015-3166
2015-05-19Fix off-by-one error in Assertion.Heikki Linnakangas
The point of the assertion is to ensure that the arrays allocated in stack are large enough, but the check was one item short. This won't matter in practice because MaxIndexTuplesPerPage is an overestimate, so you can't have that many items on a page in reality. But let's be tidy. Spotted by Anastasia Lubennikova. Backpatch to all supported versions, like the patch that added the assertion.
2015-05-18Stamp 9.3.7.Tom Lane
2015-05-18Fix error message in pre_sync_fname.Robert Haas
The old one didn't include %m anywhere, and required extra translation. Report by Peter Eisentraut. Fix by me. Review by Tom Lane.
2015-05-18Check return values of sensitive system library calls.Noah Misch
PostgreSQL already checked the vast majority of these, missing this handful that nearly cannot fail. If putenv() failed with ENOMEM in pg_GSS_recvauth(), authentication would proceed with the wrong keytab file. If strftime() returned zero in cache_locale_time(), using the unspecified buffer contents could lead to information exposure or a crash. Back-patch to 9.0 (all supported versions). Other unchecked calls to these functions, especially those in frontend code, pose negligible security concern. This patch does not address them. Nonetheless, it is always better to check return values whose specification provides for indicating an error. In passing, fix an off-by-one error in strftime_win32()'s invocation of WideCharToMultiByte(). Upon retrieving a value of exactly MAX_L10N_DATA bytes, strftime_win32() would overrun the caller's buffer by one byte. MAX_L10N_DATA is chosen to exceed the length of every possible value, so the vulnerable scenario probably does not arise. Security: CVE-2015-3166
2015-05-18Add error-throwing wrappers for the printf family of functions.Noah Misch
All known standard library implementations of these functions can fail with ENOMEM. A caller neglecting to check for failure would experience missing output, information exposure, or a crash. Check return values within wrappers and code, currently just snprintf.c, that bypasses the wrappers. The wrappers do not return after an error, so their callers need not check. Back-patch to 9.0 (all supported versions). Popular free software standard library implementations do take pains to bypass malloc() in simple cases, but they risk ENOMEM for floating point numbers, positional arguments, large field widths, and large precisions. No specification demands such caution, so this commit regards every call to a printf family function as a potential threat. Injecting the wrappers implicitly is a compromise between patch scope and design goals. I would prefer to edit each call site to name a wrapper explicitly. libpq and the ECPG libraries would, ideally, convey errors to the caller rather than abort(). All that would be painfully invasive for a back-patched security fix, hence this compromise. Security: CVE-2015-3166
2015-05-18Permit use of vsprintf() in PostgreSQL code.Noah Misch
The next commit needs it. Back-patch to 9.0 (all supported versions).
2015-05-18Prevent a double free by not reentering be_tls_close().Noah Misch
Reentering this function with the right timing caused a double free, typically crashing the backend. By synchronizing a disconnection with the authentication timeout, an unauthenticated attacker could achieve this somewhat consistently. Call be_tls_close() solely from within proc_exit_prepare(). Back-patch to 9.0 (all supported versions). Benkocs Norbert Attila Security: CVE-2015-3165
2015-05-18Translation updatesPeter Eisentraut
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 3ce9e5ca72c3948b4c592e82a5ddb9b69b97d14b
2015-05-17Fix typosPeter Eisentraut
2015-05-15Update time zone data files to tzdata release 2015d.Tom Lane
DST law changes in Egypt, Mongolia, Palestine. Historical corrections for Canada and Chile. Revised zone abbreviation for America/Adak (HST/HDT not HAST/HADT).
2015-05-13Fix RBM_ZERO_AND_LOCK mode to not acquire lock on local buffers.Heikki Linnakangas
Commit 81c45081 introduced a new RBM_ZERO_AND_LOCK mode to ReadBuffer, which takes a lock on the buffer before zeroing it. However, you cannot take a lock on a local buffer, and you got a segfault instead. The version of that patch committed to master included a check for !isLocalBuf, and therefore didn't crash, but oddly I missed that in the back-patched versions. This patch adds that check to the back-branches too. RBM_ZERO_AND_LOCK mode is only used during WAL replay, and in hash indexes. WAL replay only deals with shared buffers, so the only way to trigger the bug is with a temporary hash index. Reported by Artem Ignatyev, analysis by Tom Lane.
2015-05-11Fix incorrect checking of deferred exclusion constraint after a HOT update.Tom Lane
If a row that potentially violates a deferred exclusion constraint is HOT-updated later in the same transaction, the exclusion constraint would be reported as violated when the check finally occurs, even if the row(s) the new row originally conflicted with have since been removed. This happened because the wrong TID was passed to check_exclusion_constraint(), causing the live HOT-updated row to be seen as a conflicting row rather than recognized as the row-under-test. Per bug #13148 from Evan Martin. It's been broken since exclusion constraints were invented, so back-patch to all supported branches.
2015-05-11Increase threshold for multixact member emergency autovac to 50%.Robert Haas
Analysis by Noah Misch shows that the 25% threshold set by commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 is lower than any other, similar autovac threshold. While we don't know exactly what value will be optimal for all users, it is better to err a little on the high side than on the low side. A higher value increases the risk that users might exhaust the available space and start seeing errors before autovacuum can clean things up sufficiently, but a user who hits that problem can compensate for it by reducing autovacuum_multixact_freeze_max_age to a value dependent on their average multixact size. On the flip side, if the emergency cap imposed by that patch kicks in too early, the user will experience excessive wraparound scanning and will be unable to mitigate that problem by configuration. The new value will hopefully reduce the risk of such bad experiences while still providing enough headroom to avoid multixact member exhaustion for most users. Along the way, adjust the documentation to reflect the effects of commit 04e6d3b877e060d8445eb653b7ea26b1ee5cec6b, which taught autovacuum to run for multixact wraparound even when autovacuum is configured off.
2015-05-11Even when autovacuum=off, force it for members as we do in other cases.Robert Haas
Thomas Munro, with some adjustments by me.
2015-05-10Advance the stop point for multixact offset creation only at checkpoint.Robert Haas
Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c advanced the stop point at vacuum time, but this has subsequently been shown to be unsafe as a result of analysis by myself and Thomas Munro and testing by Thomas Munro. The crux of the problem is that the SLRU deletion logic may get confused about what to remove if, at exactly the right time during the checkpoint process, the head of the SLRU crosses what used to be the tail. This patch, by me, fixes the problem by advancing the stop point only following a checkpoint. This has the additional advantage of making the removal logic work during recovery more like the way it works during normal running, which is probably good. At least one of the calls to DetermineSafeOldestOffset which this patch removes was already dead, because MultiXactAdvanceOldest is called only during recovery and DetermineSafeOldestOffset was set up to do nothing during recovery. That, however, is inconsistent with the principle that recovery and normal running should work similarly, and was confusing to boot. Along the way, fix some comments that previous patches in this area neglected to update. It's not clear to me whether there's any concrete basis for the decision to use only half of the multixact ID space, but it's neither necessary nor sufficient to prevent multixact member wraparound, so the comments should not say otherwise.
2015-05-10Fix DetermineSafeOldestOffset for the case where there are no mxacts.Robert Haas
Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c failed to take into account the possibility that there might be no multixacts in existence at all. Report by Thomas Munro; patch by me.
2015-05-08Teach autovacuum about multixact member wraparound.Robert Haas
The logic introduced in commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and repaired in commits 669c7d20e6374850593cb430d332e11a3992bbcf and 7be47c56af3d3013955c91c2877c08f2a0e3e6a2 helps to ensure that we don't overwrite old multixact member information while it is still needed, but a user who creates many large multixacts can still exhaust the member space (and thus start getting errors) while autovacuum stands idly by. To fix this, progressively ramp down the effective value (but not the actual contents) of autovacuum_multixact_freeze_max_age as member space utilization increases. This makes autovacuum more aggressive and also reduces the threshold for a manual VACUUM to perform a full-table scan. This patch leaves unsolved the problem of ensuring that emergency autovacuums are triggered even when autovacuum=off. We'll need to fix that via a separate patch. Thomas Munro and Robert Haas