summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-01-29Properly terminate the array returned by GetLockConflicts().Andres Freund
GetLockConflicts() has for a long time not properly terminated the returned array. During normal processing the returned array is zero initialized which, while not pretty, is sufficient to be recognized as a invalid virtual transaction id. But the HotStandby case is more than aesthetically broken: The allocated (and reused) array is neither zeroed upon allocation, nor reinitialized, nor terminated. Not having a terminating element means that the end of the array will not be recognized and that recovery conflict handling will thus read ahead into adjacent memory. Only terminating when hitting memory content that looks like a invalid virtual transaction id. Luckily this seems so far not have caused significant problems, besides making recovery conflict more expensive. Discussion: 20150127142713.GD29457@awork2.anarazel.de Backpatch into all supported branches.
2015-01-29Align buffer descriptors to cache line boundaries.Andres Freund
Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability; especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results after minor configuration changes. In addition to aligning the start of descriptor array, also force the size of individual descriptors to be of a common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the struct BufferDesc more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized on 32bit platforms for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.GD32123@awork2.anarazel.de Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter Geoghegan.
2015-01-29Fix #ifdefed'ed out code to compile again.Andres Freund
2015-01-29Fix bug where GIN scan keys were not initialized with gin_fuzzy_search_limit.Heikki Linnakangas
When gin_fuzzy_search_limit was used, we could jump out of startScan() without calling startScanKey(). That was harmless in 9.3 and below, because startScanKey()() didn't do anything interesting, but in 9.4 it initializes information needed for skipping entries (aka GIN fast scans), and you readily get a segfault if it's not done. Nevertheless, it was clearly wrong all along, so backpatch all the way to 9.1 where the early return was introduced. (AFAICS startScanKey() did nothing useful in 9.3 and below, because the fields it initialized were already initialized in ginFillScanKey(), but I don't dare to change that in a minor release. ginFillScanKey() is always called in gingetbitmap() even though there's a check there to see if the scan keys have already been initialized, because they never are; ginrescan() free's them.) In the passing, remove unnecessary if-check from the second inner loop in startScan(). We already check in the first loop that the condition is true for all entries. Reported by Olaf Gawenda, bug #12694, Backpatch to 9.1 and above, although AFAICS it causes a live bug only in 9.4.
2015-01-29Move out-of-memory error checks from aset.c to mcxt.cRobert Haas
This potentially allows us to add mcxt.c interfaces that do something other than throw an error when memory cannot be allocated. We'll handle adding those interfaces in a separate commit. Michael Paquier, with minor changes by me
2015-01-28Add usebypassrls to pg_user and pg_shadowStephen Frost
The row level security patches didn't add the 'usebypassrls' columns to the pg_user and pg_shadow views on the belief that they were deprecated, but we havn't actually said they are and therefore we should include it. This patch corrects that, adds missing documentation for rolbypassrls into the system catalog page for pg_authid, along with the entries for pg_user and pg_shadow, and cleans up a few other uses of 'row-level' cases to be 'row level' in the docs. Pointed out by Amit Kapila. Catalog version bump due to system view changes.
2015-01-28Clean up range-table building in copy.cStephen Frost
Commit 804b6b6db4dcfc590a468e7be390738f9f7755fb added the build of a range table in copy.c to initialize the EState es_range_table since it can be needed in error paths. Unfortunately, that commit didn't appreciate that some code paths might end up not initializing the rte which is used to build the range table. Fix that and clean up a couple others things along the way- build it only once and don't explicitly set it on the !is_from path as it doesn't make any sense there (cstate is palloc0'd, so this isn't an issue from an initializing standpoint either). The prior commit went back to 9.0, but this only goes back to 9.1 as prior to that the range table build happens immediately after building the RTE and therefore doesn't suffer from this issue. Pointed out by Robert.
2015-01-28Fix column-privilege leak in error-message pathsStephen Frost
While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
2015-01-28Fix typo in comment.Heikki Linnakangas
2015-01-28Remove dead NULL-pointer checks in GiST code.Heikki Linnakangas
gist_poly_compress() and gist_circle_compress() checked for a NULL-pointer key argument, but that was dead code; the gist code never passes a NULL-pointer to the "compress" method. This commit also removes a documentation note added in commit a0a3883, about doing NULL-pointer checks in the "compress" method. It was added based on the fact that some implementations were doing NULL-pointer checks, but those checks were unnecessary in the first place. The NULL-pointer check in gbt_var_same() function was also unnecessary. The arguments to the "same" method come from the "compress", "union", or "picksplit" methods, but none of them return a NULL pointer. None of this is to be confused with SQL NULL values. Those are dealt with by the gist machinery, and are never passed to the GiST opclass methods. Michael Paquier
2015-01-27Fix NUMERIC field access macros to treat NaNs consistently.Tom Lane
Commit 145343534c153d1e6c3cff1fa1855787684d9a38 arranged to store numeric NaN values as short-header numerics, but the field access macros did not get the memo: they thought only "SHORT" numerics have short headers. Most of the time this makes no difference because we don't access the weight or dscale of a NaN; but numeric_send does that. As pointed out by Andrew Gierth, this led to fetching uninitialized bytes. AFAICS this could not have any worse consequences than that; in particular, an unaligned stored numeric would have been detoasted by PG_GETARG_NUMERIC, so that there's no risk of a fetch off the end of memory. Still, the code is wrong on its own terms, and it's not hard to foresee future changes that might expose us to real risks. So back-patch to all affected branches.
2015-01-26Add a note to PG_TRY's documentation about volatile safety.Tom Lane
We had better memorialize what the actual requirements are for this.
2015-01-26Re-enable abbreviated keys on Windows.Robert Haas
Commit 1be4eb1b2d436d1375899c74e4c74486890d8777 disabled this, but I think the real problem here was fixed by commit b181a91981203f6ec9403115a2917bd3f9473707 and commit d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf. So let's try re-enabling it now and see what happens.
2015-01-26Fix volatile-safety issue in pltcl_SPI_execute_plan().Tom Lane
The "callargs" variable is modified within PG_TRY and then referenced within PG_CATCH, which is exactly the coding pattern we've now found to be unsafe. Marking "callargs" volatile would be problematic because it is passed by reference to some Tcl functions, so fix the problem by not modifying it within PG_TRY. We can just postpone the free() till we exit the PG_TRY construct, as is already done elsewhere in this same file. Also, fix failure to free(callargs) when exiting on too-many-arguments error. This is only a minor memory leak, but a leak nonetheless. In passing, remove some unnecessary "volatile" markings in the same function. Those doubtless are there because gcc 2.95.3 whinged about them, but we now know that its algorithm for complaining is many bricks shy of a load. This is certainly a live bug with compilers that optimize similarly to current gcc, so back-patch to all active branches.
2015-01-26Fix volatile-safety issue in asyncQueueReadAllNotifications().Tom Lane
The "pos" variable is modified within PG_TRY and then referenced within PG_CATCH, so for strict POSIX conformance it must be marked volatile. Superficially the code looked safe because pos's address was taken, which was sufficient to force it into memory ... but it's not sufficient to ensure that the compiler applies updates exactly where the program text says to. The volatility marking has to extend into a couple of subroutines too, but I think that's probably a good thing because the risk of out-of-order updates is mostly in those subroutines not asyncQueueReadAllNotifications() itself. In principle the compiler could have re-ordered operations such that an error could be thrown while "pos" had an incorrect value. It's unclear how real the risk is here, but for safety back-patch to all active branches.
2015-01-25Further cleanup of ReorderBufferCommit().Tom Lane
On closer inspection, we can remove the "volatile" qualifier on "using_subtxn" so long as we initialize that before the PG_TRY block, which there's no particularly good reason not to do. Also, push the "change" variable inside the PG_TRY so as to remove all question of whether it needs "volatile", and remove useless early initializations of "snapshow_now" and "using_subtxn".
2015-01-25Clean up assorted issues in ALTER SYSTEM coding.Tom Lane
Fix unsafe use of a non-volatile variable in PG_TRY/PG_CATCH in AlterSystemSetConfigFile(). While at it, clean up a bundle of other infelicities and outright bugs, including corner-case-incorrect linked list manipulation, a poorly designed and worse documented parse-and-validate function (which even included some randomly chosen hard-wired substitutes for the specified elevel in one code path ... wtf?), direct use of open() instead of fd.c's facilities, inadequate checking of write()'s return value, and generally poorly written commentary.
2015-01-24Clean up some mess in row-security patches.Tom Lane
Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change a variable inside PG_TRY and then use it in PG_CATCH without marking it "volatile". In this case though it seems saner to avoid that by doing a single assignment before entering the TRY block. I started out just intending to fix that, but the more I looked at the row-security code the more distressed I got. This patch also fixes incorrect construction of the RowSecurityPolicy cache entries (there was not sufficient care taken to copy pass-by-ref data into the cache memory context) and a whole bunch of sloppiness around the definition and use of pg_policy.polcmd. You can't use nulls in that column because initdb will mark it NOT NULL --- and I see no particular reason why a null entry would be a good idea anyway, so changing initdb's behavior is not the right answer. The internal value of '\0' wouldn't be suitable in a "char" column either, so after a bit of thought I settled on using '*' to represent ALL. Chasing those changes down also revealed that somebody wasn't paying attention to what the underlying values of ACL_UPDATE_CHR etc really were, and there was a great deal of lackadaiscalness in the catalogs.sgml documentation for pg_policy and pg_policies too. This doesn't pretend to be a complete code review for the row-security stuff, it just fixes the things that were in my face while dealing with the bugs in RelationBuildRowSecurity.
2015-01-24Fix unsafe coding in ReorderBufferCommit().Tom Lane
"iterstate" must be marked volatile since it's changed inside the PG_TRY block and then used in the PG_CATCH stanza. Noted by Mark Wilding of Salesforce. (We really need to see if we can't get the C compiler to warn about this.) Also, reset iterstate to NULL after the mainline ReorderBufferIterTXNFinish call, to ensure the PG_CATCH block doesn't try to do that a second time.
2015-01-24Replace a bunch more uses of strncpy() with safer coding.Tom Lane
strncpy() has a well-deserved reputation for being unsafe, so make an effort to get rid of nearly all occurrences in HEAD. A large fraction of the remaining uses were passing length less than or equal to the known strlen() of the source, in which case no null-padding can occur and the behavior is equivalent to memcpy(), though doubtless slower and certainly harder to reason about. So just use memcpy() in these cases. In other cases, use either StrNCpy() or strlcpy() as appropriate (depending on whether padding to the full length of the destination buffer seems useful). I left a few strncpy() calls alone in the src/timezone/ code, to keep it in sync with upstream (the IANA tzcode distribution). There are also a few such calls in ecpg that could possibly do with more analysis. AFAICT, none of these changes are more than cosmetic, except for the four occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength source leads to a non-null-terminated destination buffer and ensuing misbehavior. These don't seem like security issues, first because no stack clobber is possible and second because if your values of sslcert etc are coming from untrusted sources then you've got problems way worse than this. Still, it's undesirable to have unpredictable behavior for overlength inputs, so back-patch those four changes to all active branches.
2015-01-24Remove no-longer-referenced src/port/gethostname.c.Tom Lane
This file hasn't been part of any build since 2005, and even before that wasn't used unless you configured --with-krb4 (and had a machine without gethostname(2), obviously). What's more, we haven't actually called gethostname anywhere since then, either (except in thread_test.c, whose testing of this function is probably pointless). So we don't need it.
2015-01-24Fix assignment operator thinkoAlvaro Herrera
Pointed out by Michael Paquier
2015-01-23Fix typos, update README.Robert Haas
Peter Geoghegan
2015-01-23vacuumdb: enable parallel modeAlvaro Herrera
This mode allows vacuumdb to open several server connections to vacuum or analyze several tables simultaneously. Author: Dilip Kumar. Some reworking by Álvaro Herrera Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund
2015-01-23Don't use abbreviated keys for the final merge pass.Robert Haas
When we write tuples out to disk and read them back in, the abbreviated keys become non-abbreviated, because the readtup routines don't know anything about abbreviation. But without this fix, the rest of the code still thinks the abbreviation-aware compartor should be used, so chaos ensues. Report by Andrew Gierth; patch by Peter Geoghegan.
2015-01-23Add an explicit cast to Size to hyperloglog.cRobert Haas
MSVC generates a warning here; we hope this will make it happy. Report by Michael Paquier. Patch by David Rowley.
2015-01-22Prevent duplicate escape-string warnings when using pg_stat_statements.Tom Lane
contrib/pg_stat_statements will sometimes run the core lexer a second time on submitted statements. Formerly, if you had standard_conforming_strings turned off, this led to sometimes getting two copies of any warnings enabled by escape_string_warning. While this is probably no longer a big deal in the field, it's a pain for regression testing. To fix, change the lexer so it doesn't consult the escape_string_warning GUC variable directly, but looks at a copy in the core_yy_extra_type state struct. Then, pg_stat_statements can change that copy to disable warnings while it's redoing the lexing. It seemed like a good idea to make this happen for all three of the GUCs consulted by the lexer, not just escape_string_warning. There's not an immediate use-case for callers to adjust the other two AFAIK, but making it possible is easy enough and seems like good future-proofing. Arguably this is a bug fix, but there doesn't seem to be enough interest to justify a back-patch. We'd not be able to back-patch exactly as-is anyway, for fear of breaking ABI compatibility of the struct. (We could perhaps back-patch the addition of only escape_string_warning by adding it at the end of the struct, where there's currently alignment padding space.)
2015-01-22Fix whitespacePeter Eisentraut
2015-01-22Tweak BRIN minmax operator classAlvaro Herrera
In the union support proc, we were not checking the hasnulls flag of value A early enough, so it could be skipped if the "allnulls" flag in value B is set. Also, a check on the allnulls flag of value "B" was redundant, so remove it. Also change inet_minmax_ops to not be the default opclass for type inet, as a future inclusion operator class would be more useful and it's pretty difficult to change default opclass for a datatype later on. (There is no catversion bump for this catalog change; this shouldn't be a problem.) Extracted from a larger patch to add an "inclusion" operator class. Author: Emre Hasegeli
2015-01-22Repair brain fade in commit b181a91981203f6ec9403115a2917bd3f9473707.Robert Haas
The split between which things need to happen in the C-locale case and which needed to happen in the locale-aware case was a few bricks short of a load. Try to fix that.
2015-01-22adjust ACL owners for REASSIGN and ALTER OWNER TOBruce Momjian
When REASSIGN and ALTER OWNER TO are used, both the object owner and ACL list should be changed from the old owner to the new owner. This patch fixes types, foreign data wrappers, and foreign servers to change their ACL list properly; they already changed owners properly. BACKWARD INCOMPATIBILITY? Report by Alexey Bashtanov
2015-01-22More fixes for abbreviated keys infrastructure.Robert Haas
First, when LC_COLLATE = C, bttext_abbrev_convert should use memcpy() rather than strxfrm() to construct the abbreviated key, because the authoritative comparator uses memcpy(). If we do anything else here, we might get inconsistent answers, and the buildfarm says this risk is not theoretical. It should be faster this way, too. Second, while I'm looking at bttext_abbrev_convert, convert a needless use of goto into the loop it's trying to implement into an actual loop. Both of the above problems date to the original commit of abbreviated keys, commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23. Third, fix a bogus assignment to tss->locale before tss is set up. That's a new goof in commit b529b65d1bf8537ca7fa024760a9782d7c8b66e5.
2015-01-22Heavily refactor btsortsupport_worker.Robert Haas
Prior to commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23, this function only had one job, which was to decide whether we could avoid trampolining through the fmgr layer when performing sort comparisons. As of that commit, it has a second job, which is to decide whether we can use abbreviated keys. Unfortunately, those two tasks are somewhat intertwined in the existing coding, which is likely why neither Peter Geoghegan nor I noticed prior to commit that this calls pg_newlocale_from_collation() in cases where it didn't previously. The buildfarm noticed, though. To fix, rewrite the logic so that the decision as to which comparator to use is more cleanly separated from the decision about abbreviation.
2015-01-22reinit.h: Fix typo in identification commentAlvaro Herrera
Author: Sawada Masahiko
2015-01-20Disable abbreviated keys on Windows.Robert Haas
Most of the Windows buildfarm members (bowerbird, hamerkop, currawong, jacana, brolga) are unhappy with yesterday's abbreviated keys patch, although there are some (narwhal, frogmouth) that seem OK with it. Since there's no obvious pattern to explain why some are working and others are failing, just disable this across-the-board on Windows for now. This is a bit unfortunate since the optimization will be a big win in some cases, but we can't leave the buildfarm broken.
2015-01-20tools/ccsym: update for modern versions of gccBruce Momjian
This dumps the predefined preprocessor macros
2015-01-20Add strxfrm_l to list of functions where Windows adds an underscore.Robert Haas
Per buildfarm failure on bowerbird after last night's commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23. Peter Geoghegan
2015-01-19In pg_regress, remove the temporary installation upon successful exit.Tom Lane
This results in a very substantial reduction in disk space usage during "make check-world", since that sequence involves creation of numerous temporary installations. It should also help a bit in the buildfarm, even though the buildfarm script doesn't create as many temp installations, because the current script misses deleting some of them; and anyway it seems better to do this once in one place rather than expecting that script to get it right every time. In 9.4 and HEAD, also undo the unwise choice in commit b1aebbb6a86e96d7 to report strerror(errno) after a rmtree() failure. rmtree has already reported that, possibly for multiple failures with distinct errnos; and what's more, by the time it returns there is no good reason to assume that errno still reflects the last reportable error. So reporting errno here is at best redundant and at worst badly misleading. Back-patch to all supported branches, so that future revisions of the buildfarm script can rely on this behavior.
2015-01-19Adjust "pgstat wait timeout" message to be a translatable LOG message.Tom Lane
Per discussion, change the log level of this message to be LOG not WARNING. The main point of this change is to avoid causing buildfarm run failures when the stats collector is exceptionally slow to respond, which it not infrequently is on some of the smaller/slower buildfarm members. This change does lose notice to an interactive user when his stats query is looking at out-of-date stats, but the majority opinion (not necessarily that of yours truly) is that WARNING messages would probably not get noticed anyway on heavily loaded production systems. A LOG message at least ensures that the problem is recorded somewhere where bulk auditing for the issue is possible. Also, instead of an untranslated "pgstat wait timeout" message, provide a translatable and hopefully more understandable message "using stale statistics instead of current ones because stats collector is not responding". The original text was written hastily under the assumption that it would never really happen in practice, which we now know to be unduly optimistic. Back-patch to all active branches, since we've seen the buildfarm issue in all branches.
2015-01-19Fix various shortcomings of the new PrivateRefCount infrastructure.Andres Freund
As noted by Tom Lane the improvements in 4b4b680c3d6 had the problem that in some situations we searched, entered and modified entries in the private refcount hash while holding a spinlock. I had tried to keep the logic entirely local to PinBuffer_Locked(), but that's not really possible given it's called with a spinlock held... Besides being disadvantageous from a performance point of view, this also has problems with error handling safety. If we failed inserting an entry into the hashtable due to an out of memory error, we'd error out with a held spinlock. Not good. Change the way private refcounts are manipulated: Before a buffer can be tracked an entry has to be reserved using ReservePrivateRefCountEntry(); then, if a entry is not found using GetPrivateRefCountEntry(), it can be entered with NewPrivateRefCountEntry(). Also take advantage of the fact that PinBuffer_Locked() currently is never called for buffers that already have been pinned by the current backend and don't search the private refcount entries for preexisting local pins. That results in a small, but measurable, performance improvement. Additionally make ReleaseBuffer() always call UnpinBuffer() for shared buffers. That avoids duplicating work in an eventual UnpinBuffer() call that already has been done in ReleaseBuffer() and also saves some code. Per discussion with Tom Lane. Discussion: 15028.1418772313@sss.pgh.pa.us
2015-01-19Use abbreviated keys for faster sorting of text datums.Robert Haas
This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.
2015-01-19Typo fix.Robert Haas
Etsuro Fujita
2015-01-19BRIN typo fix.Robert Haas
Amit Langote
2015-01-18Install shared libraries also in bin on cygwin, mingwPeter Eisentraut
This was previously only done for libpq, not it's done for all shared libraries. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2015-01-18Fix ancient thinko in default table rowcount estimation.Tom Lane
The code used sizeof(ItemPointerData) where sizeof(ItemIdData) is correct, since we're trying to account for a tuple's line pointer. Spotted by Tomonari Katsumata (bug #12584). Although this mistake is of very long standing, no back-patch, since it's a relatively harmless error and changing it would risk changing default planner behavior in stable branches. (I don't see any change in regression test outputs here, but the buildfarm may think differently.)
2015-01-18Activate low-volume optional logging during regression test runs.Noah Misch
Elaborated from an idea by Andres Freund.
2015-01-18Fix use of already freed memory when dumping a database's security label.Andres Freund
pg_dump.c:dumDatabase() called ArchiveEntry() with the results of a a query that was PQclear()ed a couple lines earlier. Backpatch to 9.2 where security labels for shared objects where introduced.
2015-01-17Replace walsender's latch with the general shared latch.Andres Freund
Relying on the normal shared latch simplifies interrupt/signal handling because we can rely on all signal handlers setting the proc latch. That in turn allows us to avoid the use of ImmediateInterruptOK, which arguably isn't correct because WaitLatchOrSocket isn't declared to be immediately interruptible. Also change sections that wait on the walsender's latch to notice interrupts quicker/more reliably and make them more consistent with each other. This is part of a larger "get rid of ImmediateInterruptOK" series. Discussion: 20150115020335.GZ5245@awork2.anarazel.de
2015-01-16Show sort ordering options in EXPLAIN output.Tom Lane
Up to now, EXPLAIN has contented itself with printing the sort expressions in a Sort or Merge Append plan node. This patch improves that by annotating the sort keys with COLLATE, DESC, USING, and/or NULLS FIRST/LAST whenever nondefault sort ordering options are used. The output is now a reasonably close approximation of an ORDER BY clause equivalent to the plan's ordering. Marius Timmer, Lukas Kreft, and Arne Scheffer; reviewed by Mike Blackwell. Some additional hacking by me.
2015-01-17Advance backend's advertised xmin more aggressively.Heikki Linnakangas
Currently, a backend will reset it's PGXACT->xmin value when it doesn't have any registered snapshots left. That covered the common case that a transaction in read committed mode runs several queries, one after each other, as there would be no snapshots active between those queries. However, if you hold cursors across each of the query, we didn't get a chance to reset xmin. To make that better, keep all the registered snapshots in a pairing heap, ordered by xmin so that it's always quick to find the snapshot with the smallest xmin. That allows us to advance PGXACT->xmin whenever the oldest snapshot is deregistered, even if there are others still active. Per discussion originally started by Jeff Davis back in 2009 and more recently by Robert Haas.