summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-08-18Fix a few bogus statement type names in plpgsql error messages.Tom Lane
plpgsql's error location context messages ("PL/pgSQL function fn-name line line-no at stmt-type") would misreport a CONTINUE statement as being an EXIT, and misreport a MOVE statement as being a FETCH. These are clear bugs that have been there a long time, so back-patch to all supported branches. In addition, in 9.5 and HEAD, change the description of EXECUTE from "EXECUTE statement" to just plain EXECUTE; there seems no good reason why this statement type should be described differently from others that have a well-defined head keyword. And distinguish GET STACKED DIAGNOSTICS from plain GET DIAGNOSTICS. These are a bit more of a judgment call, and also affect existing regression-test outputs, so I did not back-patch into stable branches. Pavel Stehule and Tom Lane
2015-08-15Don't use 'bool' as a struct member name in help_config.c.Andres Freund
Doing so doesn't work if bool is a macro rather than a typedef. Although c.h spends some effort to support configurations where bool is a preexisting macro, help_config.c has existed this way since 2003 (b700a6), and there have not been any reports of problems. Backpatch anyway since this is as riskless as it gets. Discussion: 20150812084351.GD8470@awork2.anarazel.de Backpatch: 9.0-master
2015-08-15Use the correct type for TableInfo->relreplident.Andres Freund
Mistakenly relreplident was stored as a bool. That works today as c.h typedefs bool to a char, but isn't very future proof. Discussion: 20150812084351.GD8470@awork2.anarazel.de Backpatch: 9.4 where replica identity was introduced.
2015-08-14Encoding PG_UHC is code page 949.Noah Misch
This fixes presentation of non-ASCII messages to the Windows event log and console in rare cases involving Korean locale. Processes like the postmaster and checkpointer, but not processes attached to databases, were affected. Back-patch to 9.4, where MessageEncoding was introduced. The problem exists in all supported versions, but this change has no effect in the absence of the code recognizing PG_UHC MessageEncoding. Noticed while investigating bug #13427 from Dmitri Bourlatchkov.
2015-08-14Restore old pgwin32_message_to_UTF16() behavior outside transactions.Noah Misch
Commit 49c817eab78c6f0ce8c3bf46766b73d6cf3190b7 replaced with a hard error the dubious pg_do_encoding_conversion() behavior when outside a transaction. Reintroduce the historic soft failure locally within pgwin32_message_to_UTF16(). This fixes errors when writing messages in less-common encodings to the Windows event log or console. Back-patch to 9.4, where the aforementioned commit first appeared. Per bug #13427 from Dmitri Bourlatchkov.
2015-08-13Improve regression test case to avoid depending on system catalog stats.Tom Lane
In commit 95f4e59c32866716 I added a regression test case that examined the plan of a query on system catalogs. That isn't a terribly great idea because the catalogs tend to change from version to version, or even within a version if someone makes an unrelated regression-test change that populates the catalogs a bit differently. Usually I try to make planner test cases rely on test tables that have not changed since Berkeley days, but I got sloppy in this case because the submitted crasher example queried the catalogs and I didn't spend enough time on rewriting it. But it was a problem waiting to happen, as I was rudely reminded when I tried to port that patch into Salesforce's Postgres variant :-(. So spend a little more effort and rewrite the query to not use any system catalogs. I verified that this version still provokes the Assert if 95f4e59c32866716's code fix is reverted. I also removed the EXPLAIN output from the test, as it turns out that the assertion occurs while considering a plan that isn't the one ultimately selected anyway; so there's no value in risking any cross-platform variation in that printout. Back-patch to 9.2, like the previous patch.
2015-08-13Fix declaration of isarray variable.Michael Meskes
Found and fixed by Andres Freund.
2015-08-12Undo mistaken tightening in join_is_legal().Tom Lane
One of the changes I made in commit 8703059c6b55c427 turns out not to have been such a good idea: we still need the exception in join_is_legal() that allows a join if both inputs already overlap the RHS of the special join we're checking. Otherwise we can miss valid plans, and might indeed fail to find a plan at all, as in recent report from Andreas Seltenreich. That code was added way back in commit c17117649b9ae23d, but I failed to include a regression test case then; my bad. Put it back with a better explanation, and a test this time. The logic does end up a bit different than before though: I now believe it's appropriate to make this check first, thereby allowing such a case whether or not we'd consider the previous SJ(s) to commute with this one. (Presumably, we already decided they did; but it was confusing to have this consideration in the middle of the code that was handling the other case.) Back-patch to all active branches, like the previous patch.
2015-08-12This routine was calling ecpg_alloc to allocate to memory but did notMichael Meskes
actually check the returned pointer allocated, potentially NULL which could be the result of a malloc call. Issue noted by Coverity, fixed by Michael Paquier <michael@otacoo.com>
2015-08-12Fix some possible low-memory failures in regexp compilation.Tom Lane
newnfa() failed to set the regex error state when malloc() fails. Several places in regcomp.c failed to check for an error after calling subre(). Each of these mistakes could lead to null-pointer-dereference crashes in memory-starved backends. Report and patch by Andreas Seltenreich. Back-patch to all branches.
2015-08-11Minor cleanups in slot related code.Andres Freund
Fix a bunch of typos, and remove two superflous includes. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com Backpatch: 9.4
2015-08-10Fix privilege dumping from servers too old to have that type of privilege.Tom Lane
pg_dump produced fairly silly GRANT/REVOKE commands when dumping types from pre-9.2 servers, and when dumping functions or procedural languages from pre-7.3 servers. Those server versions lack the typacl, proacl, and/or lanacl columns respectively, and pg_dump substituted default values that were in fact incorrect. We ended up revoking all the owner's own privileges for the object while granting all privileges to PUBLIC. Of course the owner would then have those privileges again via PUBLIC, so long as she did not try to revoke PUBLIC's privileges; which may explain the lack of field reports. Nonetheless this is pretty silly behavior. The stakes were raised by my recent patch to make pg_dump dump shell types, because 9.2 and up pg_dump would proceed to emit bogus GRANT/REVOKE commands for a shell type if dumping from a pre-9.2 server; and the server will not accept GRANT/REVOKE commands for a shell type. (Perhaps it should, but that's a topic for another day.) So the resulting dump script wouldn't load without errors. The right thing to do is to act as though these objects have default privileges (null ACL entries), which causes pg_dump to print no GRANT/REVOKE commands at all for them. That fixes the silly results and also dodges the problem with shell types. In passing, modify getProcLangs() to be less creatively different about how to handle missing columns when dumping from older server versions. Every other data-acquisition function in pg_dump does that by substituting appropriate default values in the version-specific SQL commands, and I see no reason why this one should march to its own drummer. Its use of "SELECT *" was likewise not conformant with anyplace else, not to mention it's not considered good SQL style for production queries. Back-patch to all supported versions. Although 9.0 and 9.1 pg_dump don't have the issue with typacl, they are more likely than newer versions to be used to dump from ancient servers, so we ought to fix the proacl/lanacl issues all the way back.
2015-08-10Accept alternate spellings of __sparcv7 and __sparcv8.Tom Lane
Apparently some versions of gcc prefer __sparc_v7__ and __sparc_v8__. Per report from Waldemar Brodkorb.
2015-08-10Further mucking with PlaceHolderVar-related restrictions on join order.Tom Lane
Commit 85e5e222b1dd02f135a8c3bf387d0d6d88e669bd turns out not to have taken care of all cases of the partially-evaluatable-PlaceHolderVar problem found by Andreas Seltenreich's fuzz testing. I had set it up to check for risky PHVs only in the event that we were making a star-schema-based exception to the param_source_rels join ordering heuristic. However, it turns out that the problem can occur even in joins that satisfy the param_source_rels heuristic, in which case allow_star_schema_join() isn't consulted. Refactor so that we check for risky PHVs whenever the proposed join has any remaining parameterization. Back-patch to 9.2, like the previous patch (except for the regression test case, which only works back to 9.3 because it uses LATERAL). Note that this discovery implies that problems of this sort could've occurred in 9.2 and up even before the star-schema patch; though I've not tried to prove that experimentally.
2015-08-10Fix copy & paste mistake in pg_get_replication_slots().Andres Freund
XLogRecPtr was compared with InvalidTransactionId instead of InvalidXLogRecPtr. As both are defined to the same value this doesn't cause any actual problems, but it's still wrong. Backpatch: 9.4-master, bug was introduced in 9.4
2015-08-07Further adjustments to PlaceHolderVar removal.Tom Lane
A new test case from Andreas Seltenreich showed that we were still a bit confused about removing PlaceHolderVars during join removal. Specifically, remove_rel_from_query would remove a PHV that was used only underneath the removable join, even if the place where it's used was the join partner relation and not the join clause being deleted. This would lead to a "too late to create a new PlaceHolderInfo" error later on. We can defend against that by checking ph_eval_at to see if the PHV could possibly be getting used at some partner rel. Also improve some nearby LATERAL-related logic. I decided that the check on ph_lateral needed to take precedence over the check on ph_needed, in case there's a lateral reference underneath the join being considered. (That may be impossible, but I'm not convinced of it, and it's easy enough to defend against the case.) Also, I realized that remove_rel_from_query's logic for updating LateralJoinInfos is dead code, because we don't build those at all until after join removal. Back-patch to 9.3. Previous versions didn't have the LATERAL issues, of course, and they also didn't attempt to remove PlaceHolderInfos during join removal. (I'm starting to wonder if changing that was really such a great idea.)
2015-08-07Fix attach-related race condition in shm_mq_send_bytes.Robert Haas
Spotted by Antonin Houska.
2015-08-06Fix old oversight in join removal logic.Tom Lane
Commit 9e7e29c75ad441450f9b8287bd51c13521641e3b introduced an Assert that join removal didn't reduce the eval_at set of any PlaceHolderVar to empty. At first glance it looks like join_is_removable ensures that's true --- but actually, the loop in join_is_removable skips PlaceHolderVars that are not referenced above the join due to be removed. So, if we don't want any empty eval_at sets, the right thing to do is to delete any now-unreferenced PlaceHolderVars from the data structure entirely. Per fuzz testing by Andreas Seltenreich. Back-patch to 9.3 where the aforesaid Assert was added.
2015-08-06Fix eclass_useful_for_merging to give valid results for appendrel children.Tom Lane
Formerly, this function would always return "true" for an appendrel child relation, because it would think that the appendrel parent was a potential join target for the child. In principle that should only lead to some inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed that it could lead to "could not find pathkey item to sort" planner errors in odd corner cases. Specifically, we would think that all columns of a child table's multicolumn index were interesting pathkeys, causing us to generate a MergeAppend path that sorts by all the columns. However, if any of those columns weren't actually used above the level of the appendrel, they would not get added to that rel's targetlist, which would result in being unable to resolve the MergeAppend's sort keys against its targetlist during createplan.c. Backpatch to 9.3. In older versions, columns of an appendrel get added to its targetlist even if they're not mentioned above the scan level, so that the failure doesn't occur. It might be worth back-patching this fix to older versions anyway, but I'll refrain for the moment.
2015-08-06Further fixes for degenerate outer join clauses.Tom Lane
Further testing revealed that commit f69b4b9495269cc4 was still a few bricks shy of a load: minor tweaking of the previous test cases resulted in the same wrong-outer-join-order problem coming back. After study I concluded that my previous changes in make_outerjoininfo() were just accidentally masking the problem, and should be reverted in favor of forcing syntactic join order whenever an upper outer join's predicate doesn't mention a lower outer join's LHS. This still allows the chained-outer-joins style that is the normally optimizable case. I also tightened things up some more in join_is_legal(). It seems to me on review that what's really happening in the exception case where we ignore a mismatched special join is that we're allowing the proposed join to associate into the RHS of the outer join we're comparing it to. As such, we should *always* insist that the proposed join be a left join, which eliminates a bunch of rather dubious argumentation. The case where we weren't enforcing that was the one that was already known buggy anyway (it had a violatable Assert before the aforesaid commit) so it hardly deserves a lot of deference. Back-patch to all active branches, like the previous patch. The added regression test case failed in all branches back to 9.1, and I think it's only an unrelated change in costing calculations that kept 9.0 from choosing a broken plan.
2015-08-06Fix incorrect calculation in shm_mq_receive.Robert Haas
If some, but not all, of the length word has already been read, and the next attempt to read sees exactly the number of bytes needed to complete the length word, or fewer, then we'll incorrectly read less than all of the available data. Antonin Houska
2015-08-06Fix `make installcheck` for serializable transactions.Kevin Grittner
Commit e5550d5fec66aa74caad1f79b79826ec64898688 added some new tests for ALTER TABLE which involved table scans. When default_transaction_isolation = 'serializable' these acquire relation-level SIReadLocks. The test results didn't cope with that. Add SIReadLock as the minimum lock level for purposes of these tests. This could also be fixed by excluding this type of lock from the my_locks view, but it would be a bug for SIReadLock to show up for a relation which was not otherwise locked, so do it this way to allow that sort of condition to cause a regression test failure. There is some question whether we could avoid taking SIReadLocks during these operations, but confirming the safety of that and figuring out how to avoid the locks is not trivial, and would be a separate patch. Backpatch to 9.4 where the new tests were added.
2015-08-05Make real sure we don't reassociate joins into or out of SEMI/ANTI joins.Tom Lane
Per the discussion in optimizer/README, it's unsafe to reassociate anything into or out of the RHS of a SEMI or ANTI join. An example from Piotr Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this rule, so lock it down a little harder. I couldn't find a reasonably simple example of the optimizer trying to do this, so no new regression test. (Piotr's example involved the random search in GEQO accidentally trying an invalid case and triggering a sanity check way downstream in clause selectivity estimation, which did not seem like a sequence of events that would be useful to memorialize in a regression test as-is.) Back-patch to all active branches.
2015-08-04Fix pg_dump to dump shell types.Tom Lane
Per discussion, it really ought to do this. The original choice to exclude shell types was probably made in the dark ages before we made it harder to accidentally create shell types; but that was in 7.3. Also, cause the standard regression tests to leave a shell type behind, for convenience in testing the case in pg_dump and pg_upgrade. Back-patch to all supported branches.
2015-08-04Fix bogus "out of memory" reports in tuplestore.c.Tom Lane
The tuplesort/tuplestore memory management logic assumed that the chunk allocation overhead for its memtuples array could not increase when increasing the array size. This is and always was true for tuplesort, but we (I, I think) blindly copied that logic into tuplestore.c without noticing that the assumption failed to hold for the much smaller array elements used by tuplestore. Given rather small work_mem, this could result in an improper complaint about "unexpected out-of-memory situation", as reported by Brent DeSpain in bug #13530. The easiest way to fix this is just to increase tuplestore's initial array size so that the assumption holds. Rather than relying on magic constants, though, let's export a #define from aset.c that represents the safe allocation threshold, and make tuplestore's calculation depend on that. Do the same in tuplesort.c to keep the logic looking parallel, even though tuplesort.c isn't actually at risk at present. This will keep us from breaking it if we ever muck with the allocation parameters in aset.c. Back-patch to all supported versions. The error message doesn't occur pre-9.3, not so much because the problem can't happen as because the pre-9.3 tuplestore code neglected to check for it. (The chance of trouble is a great deal larger as of 9.3, though, due to changes in the array-size-increasing strategy.) However, allowing LACKMEM() to become true unexpectedly could still result in less-than-desirable behavior, so let's patch it all the way back.
2015-08-04Fix a PlaceHolderVar-related oversight in star-schema planning patch.Tom Lane
In commit b514a7460d9127ddda6598307272c701cbb133b7, I changed the planner so that it would allow nestloop paths to remain partially parameterized, ie the inner relation might need parameters from both the current outer relation and some upper-level outer relation. That's fine so long as we're talking about distinct parameters; but the patch also allowed creation of nestloop paths for cases where the inner relation's parameter was a PlaceHolderVar whose eval_at set included the current outer relation and some upper-level one. That does *not* work. In principle we could allow such a PlaceHolderVar to be evaluated at the lower join node using values passed down from the upper relation along with values from the join's own outer relation. However, nodeNestloop.c only supports simple Vars not arbitrary expressions as nestloop parameters. createplan.c is also a few bricks shy of being able to handle such cases; it misplaces the PlaceHolderVar parameters in the plan tree, which is why the visible symptoms of this bug are "plan should not reference subplan's variable" and "failed to assign all NestLoopParams to plan nodes" planner errors. Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. Furthermore, adding complexity to nodeNestloop.c would create a run-time penalty even for plans where this whole consideration is irrelevant. So let's just reject such paths instead. Per fuzz testing by Andreas Seltenreich; the added regression test is based on his example query. Back-patch to 9.2, like the previous patch.
2015-08-04Cap wal_buffers to avoid a server crash when it's set very large.Robert Haas
It must be possible to multiply wal_buffers by XLOG_BLCKSZ without overflowing int, or calculations in StartupXLOG will go badly wrong and crash the server. Avoid that by imposing a maximum value on wal_buffers. This will be just under 2GB, assuming the usual value for XLOG_BLCKSZ. Josh Berkus, per an analysis by Andrew Gierth.
2015-08-02Fix incorrect order of lock file removal and failure to close() sockets.Tom Lane
Commit c9b0cbe98bd783e24a8c4d8d8ac472a494b81292 accidentally broke the order of operations during postmaster shutdown: it resulted in removing the per-socket lockfiles after, not before, postmaster.pid. This creates a race-condition hazard for a new postmaster that's started immediately after observing that postmaster.pid has disappeared; if it sees the socket lockfile still present, it will quite properly refuse to start. This error appears to be the explanation for at least some of the intermittent buildfarm failures we've seen in the pg_upgrade test. Another problem, which has been there all along, is that the postmaster has never bothered to close() its listen sockets, but has just allowed them to close at process death. This creates a different race condition for an incoming postmaster: it might be unable to bind to the desired listen address because the old postmaster is still incumbent. This might explain some odd failures we've seen in the past, too. (Note: this is not related to the fact that individual backends don't close their client communication sockets. That behavior is intentional and is not changed by this patch.) Fix by adding an on_proc_exit function that closes the postmaster's ports explicitly, and (in 9.3 and up) reshuffling the responsibility for where to unlink the Unix socket files. Lock file unlinking can stay where it is, but teach it to unlink the lock files in reverse order of creation.
2015-08-02Fix race condition that lead to WALInsertLock deadlock with commit_delay.Heikki Linnakangas
If a call to WaitForXLogInsertionsToFinish() returned a value in the middle of a page, and another backend then started to insert a record to the same page, and then you called WaitXLogInsertionsToFinish() again, the second call might return a smaller value than the first call. The problem was in GetXLogBuffer(), which always updated the insertingAt value to the beginning of the requested page, not the actual requested location. Because of that, the second call might return a xlog pointer to the beginning of the page, while the first one returned a later position on the same page. XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in succession, and holds WALWriteLock on the second call, which can deadlock if the second call to WaitXLogInsertionsToFinish() blocks. Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable WALInsertLock mechanism, and this bug, was introduced.
2015-08-01Fix some planner issues with degenerate outer join clauses.Tom Lane
An outer join clause that didn't actually reference the RHS (perhaps only after constant-folding) could confuse the join order enforcement logic, leading to wrong query results. Also, nested occurrences of such things could trigger an Assertion that on reflection seems incorrect. Per fuzz testing by Andreas Seltenreich. The practical use of such cases seems thin enough that it's not too surprising we've not heard field reports about it. This has been broken for a long time, so back-patch to all active branches.
2015-07-31Fix an oversight in checking whether a join with LATERAL refs is legal.Tom Lane
In many cases, we can implement a semijoin as a plain innerjoin by first passing the righthand-side relation through a unique-ification step. However, one of the cases where this does NOT work is where the RHS has a LATERAL reference to the LHS; that makes the RHS dependent on the LHS so that unique-ification is meaningless. joinpath.c understood this, and so would not generate any join paths of this kind ... but join_is_legal neglected to check for the case, so it would think that we could do it. The upshot would be a "could not devise a query plan for the given query" failure once we had failed to generate any join paths at all for the bogus join pair. Back-patch to 9.3 where LATERAL was added.
2015-07-30Consolidate makefile code for setting top_srcdir, srcdir and VPATH.Noah Misch
Responsibility was formerly split between Makefile.global and pgxs.mk. As a result of commit b58233c71b93a32fcab7219585cafc25a27eb769, in the PGXS case, these variables were unset while parsing Makefile.global and callees. Inclusion of Makefile.custom did not work from PGXS, and the subtle difference seemed like a recipe for future bugs. Back-patch to 9.4, where that commit first appeared.
2015-07-30Avoid some zero-divide hazards in the planner.Tom Lane
Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
2015-07-28Reduce chatter from signaling of autovacuum workers.Tom Lane
Don't print a WARNING if we get ESRCH from a kill() that's attempting to cancel an autovacuum worker. It's possible (and has been seen in the buildfarm) that the worker is already gone by the time we are able to execute the kill, in which case the failure is harmless. About the only plausible reason for reporting such cases would be to help debug corrupted lock table contents, but this is hardly likely to be the most important symptom if that happens. Moreover issuing a WARNING might scare users more than is warranted. Also, since sending a signal to an autovacuum worker is now entirely a routine thing, and the worker will log the query cancel on its end anyway, reduce the message saying we're doing that from LOG to DEBUG1 level. Very minor cosmetic cleanup as well. Since the main practical reason for doing this is to avoid unnecessary buildfarm failures, back-patch to all active branches.
2015-07-28Disable ssl renegotiation by default.Andres Freund
While postgres' use of SSL renegotiation is a good idea in theory, it turned out to not work well in practice. The specification and openssl's implementation of it have lead to several security issues. Postgres' use of renegotiation also had its share of bugs. Additionally OpenSSL has a bunch of bugs around renegotiation, reported and open for years, that regularly lead to connections breaking with obscure error messages. We tried increasingly complex workarounds to get around these bugs, but we didn't find anything complete. Since these connection breakages often lead to hard to debug problems, e.g. spuriously failing base backups and significant latency spikes when synchronous replication is used, we have decided to change the default setting for ssl renegotiation to 0 (disabled) in the released backbranches and remove it entirely in 9.5 and master.. Author: Michael Paquier, with changes by me Discussion: 20150624144148.GQ4797@alap3.anarazel.de Backpatch: 9.0-9.4; 9.5 and master get a different patch
2015-07-28Make tap tests store postmaster logs and handle vpaths correctlyAndrew Dunstan
Given this it is possible that the buildfarm animals running these tests will be able to capture adequate logging to allow diagnosis of failures.
2015-07-28Remove an unsafe Assert, and explain join_clause_is_movable_into() better.Tom Lane
join_clause_is_movable_into() is approximate, in the sense that it might sometimes return "false" when actually it would be valid to push the given join clause down to the specified level. This is okay ... but there was an Assert in get_joinrel_parampathinfo() that's only safe if the answers are always exact. Comment out the Assert, and add a bunch of commentary to clarify what's going on. Per fuzz testing by Andreas Seltenreich. The added regression test is a pretty silly query, but it's based on his crasher example. Back-patch to 9.2 where the faulty logic was introduced.
2015-07-28Improve logging of TAP tests.Andrew Dunstan
Create a log file for each test run. Stdout and stderr of the test script, as well as any subprocesses run as part of the test, are redirected to the log file. This makes it a lot easier to debug test failures. Also print the test output (ok 12 - ... messages) to the log file, and the command line of any external programs executed with the system_or_bail and run_log functions. This makes it a lot easier to debug failing tests. Modify some of the pg_ctl and other command invocations to not use 'silent' or 'quiet' options, and don't redirect output to /dev/null, so that you get all the information in the log instead. In the passing, construct some command lines in a way that works if $tempdir contains quote-characters. I haven't systematically gone through all of them or tested that, so I don't know if this is enough to make that work. pg_rewind tests had a custom mechanism for creating a similar log file. Use the new generic facility instead. Michael Paquier and Heikki Linnakangas. This os a backpatch of Heikki's commit 1ea06203b82b98b5098808667f6ba652181ef5b2 modified by me to suit 9.4
2015-07-27Don't assume that PageIsEmpty() returns true on an all-zeros page.Heikki Linnakangas
It does currently, and I don't see us changing that any time soon, but we don't make that assumption anywhere else. Per Tom Lane's suggestion. Backpatch to 9.2, like the previous patch that added this assumption.
2015-07-27Reuse all-zero pages in GIN.Heikki Linnakangas
In GIN, an all-zeros page would be leaked forever, and never reused. Just add them to the FSM in vacuum, and they will be reinitialized when grabbed from the FSM. On master and 9.5, attempting to access the page's opaque struct also caused an assertion failure, although that was otherwise harmless. Reported by Jeff Janes. Backpatch to all supported versions.
2015-07-27Fix handling of all-zero pages in SP-GiST vacuum.Heikki Linnakangas
SP-GiST initialized an all-zeros page at vacuum, but that was not WAL-logged, which is not safe. You might get a torn page write, when it gets flushed to disk, and end-up with a half-initialized index page. To fix, leave it in the all-zeros state, and add it to the FSM. It will be initialized when reused. Also don't set the page-deleted flag when recycling an empty page. That was also not WAL-logged, and a torn write of that would cause the page to have an invalid checksum. Backpatch to 9.2, where SP-GiST indexes were added.
2015-07-26Make entirely-dummy appendrels get marked as such in set_append_rel_size.Tom Lane
The planner generally expects that the estimated rowcount of any relation is at least one row, *unless* it has been proven empty by constraint exclusion or similar mechanisms, which is marked by installing a dummy path as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up allpaths.c's processing of base rels into separate set_base_rel_sizes and set_base_rel_pathlists steps, the intention was that dummy rels would get marked as such during the "set size" step; this is what justifies an Assert in indxpath.c's get_loop_count that other relations should either be dummy or have positive rowcount. Unfortunately I didn't get that quite right for append relations: if all the child rels have been proven empty then set_append_rel_size would come up with a rowcount of zero, which is correct, but it didn't then do set_dummy_rel_pathlist. (We would have ended up with the right state after set_append_rel_pathlist, but that's too late, if we generate indexpaths for some other rel first.) In addition to fixing the actual bug, I installed an Assert enforcing this convention in set_rel_size; that then allows simplification of a couple of now-redundant tests for zero rowcount in set_append_rel_size. Also, to cover the possibility that third-party FDWs have been careless about not returning a zero rowcount estimate, apply clamp_row_est to whatever an FDW comes up with as the rows estimate. Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches did not have the separation between set_base_rel_sizes and set_base_rel_pathlists steps, so there was no intermediate state where an appendrel would have had inconsistent rowcount and pathlist. It's possible that adding the Assert to set_rel_size would be a good idea in older branches too; but since they're not under development any more, it's likely not worth the trouble.
2015-07-25Restore use of zlib default compression in pg_dump directory mode.Andrew Dunstan
This was broken by commit 0e7e355f27302b62af3e1add93853ccd45678443 and friends, which ignored the fact that gzopen() will treat "-1" in the mode argument as an invalid character, which it ignores, and a flag for compression level 1. Now, when this value is encountered no compression level flag is passed to gzopen, leaving it to use the zlib default. Also, enforce the documented allowed range for pg_dump's -Z option, namely 0 .. 9, and remove some consequently dead code from pg_backup_tar.c. Problem reported by Marc Mamin. Backpatch to 9.1, like the patch that introduced the bug.
2015-07-23Fix off-by-one error in calculating subtrans/multixact truncation point.Heikki Linnakangas
If there were no subtransactions (or multixacts) active, we would calculate the oldestxid == next xid. That's correct, but if next XID happens to be on the next pg_subtrans (pg_multixact) page, the page does not exist yet, and SimpleLruTruncate will produce an "apparent wraparound" warning. The warning is harmless in this case, but looks very alarming to users. Backpatch to all supported versions. Patch and analysis by Thomas Munro.
2015-07-21Fix add_rte_to_flat_rtable() for recent feature additions.Tom Lane
The TABLESAMPLE and row security patches each overlooked this function, though their errors of omission were opposite: RLS failed to zero out the securityQuals field, leading to wasteful copying of useless expression trees in finished plans, while TABLESAMPLE neglected to add a comment saying that it intentionally *isn't* deleting the tablesample subtree. There probably should be a similar comment about ctename, too. Back-patch as appropriate.
2015-07-20Fix (some of) pltcl memory usageAlvaro Herrera
As reported by Bill Parker, PL/Tcl did not validate some malloc() calls against NULL return. Fix by using palloc() in a new long-lived memory context instead. This allows us to simplify error handling too, by simply deleting the memory context instead of doing retail frees. There's still a lot that could be done to improve PL/Tcl's memory handling ... This is pretty ancient, so backpatch all the way back. Author: Michael Paquier and Álvaro Herrera Discussion: https://www.postgresql.org/message-id/CAFrbyQwyLDYXfBOhPfoBGqnvuZO_Y90YgqFM11T2jvnxjLFmqw@mail.gmail.com
2015-07-18Make WaitLatchOrSocket's timeout detection more robust.Tom Lane
In the previous coding, timeout would be noticed and reported only when poll() or socket() returned zero (or the equivalent behavior on Windows). Ordinarily that should work well enough, but it seems conceivable that we could get into a state where poll() always returns a nonzero value --- for example, if it is noticing a condition on one of the file descriptors that we do not think is reason to exit the loop. If that happened, we'd be in a busy-wait loop that would fail to terminate even when the timeout expires. We can make this more robust at essentially no cost, by deciding to exit of our own accord if we compute a zero or negative time-remaining-to-wait. Previously the code noted this but just clamped the time-remaining to zero, expecting that we'd detect timeout on the next loop iteration. Back-patch to 9.2. While 9.1 had a version of WaitLatchOrSocket, it was primitive compared to later versions, and did not guarantee reliable detection of timeouts anyway. (Essentially, this is a refinement of commit 3e7fdcffd6f77187, which was back-patched only as far as 9.2.)
2015-07-17AIX: Test the -qlonglong option before use.Noah Misch
xlc provides "long long" unconditionally at C99-compatible language levels, and this option provokes a warning. The warning interferes with "configure" tests that fail in response to any warning. Notably, before commit 85a2a8903f7e9151793308d0638621003aded5ae, it interfered with the test for -qnoansialias. Back-patch to 9.0 (all supported versions).
2015-07-16Fix a low-probability crash in our qsort implementation.Tom Lane
It's standard for quicksort implementations, after having partitioned the input into two subgroups, to recurse to process the smaller partition and then handle the larger partition by iterating. This method guarantees that no more than log2(N) levels of recursion can be needed. However, Bentley and McIlroy argued that checking to see which partition is smaller isn't worth the cycles, and so their code doesn't do that but just always recurses on the left partition. In most cases that's fine; but with worst-case input we might need O(N) levels of recursion, and that means that qsort could be driven to stack overflow. Such an overflow seems to be the only explanation for today's report from Yiqing Jin of a SIGSEGV in med3_tuple while creating an index of a couple billion entries with a very large maintenance_work_mem setting. Therefore, let's spend the few additional cycles and lines of code needed to choose the smaller partition for recursion. Also, fix up the qsort code so that it properly uses size_t not int for some intermediate values representing numbers of items. This would only be a live risk when sorting more than INT_MAX bytes (in qsort/qsort_arg) or tuples (in qsort_tuple), which I believe would never happen with any caller in the current core code --- but perhaps it could happen with call sites in third-party modules? In any case, this is trouble waiting to happen, and the corrected code is probably if anything shorter and faster than before, since it removes sign-extension steps that had to happen when converting between int and size_t. In passing, move a couple of CHECK_FOR_INTERRUPTS() calls so that it's not necessary to preserve the value of "r" across them, and prettify the output of gen_qsort_tuple.pl a little. Back-patch to all supported branches. The odds of hitting this issue are probably higher in 9.4 and up than before, due to the new ability to allocate sort workspaces exceeding 1GB, but there's no good reason to believe that it's impossible to crash older branches this way.
2015-07-16Fix spelling errorMagnus Hagander
David Rowley