summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2020-08-15Move new LOCKTAG_DATABASE_FROZEN_IDS to end of enum LockTagType.Noah Misch
Several PGXN modules reference LockTagType values; renumbering would force a recompile of those modules. Oversight in back-patch of today's commit 566372b3d6435639e4cc4476d79b8505a0297c87. Back-patch to released branches, v12 through 9.5. Reported by Tom Lane. Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
2020-08-15Prevent concurrent SimpleLruTruncate() for any given SLRU.Noah Misch
The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
2020-08-14Be more careful about the shape of hashable subplan clauses.Tom Lane
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan has the form of one or more OpExprs whose LHS is an expression of the outer query's, while the RHS is an expression over Params representing output columns of the subquery. However, the planner only went as far as verifying that the clauses were all binary OpExprs. This works 99.99% of the time, because the clauses have the right shape when emitted by the parser --- but it's possible for function inlining to break that, as reported by PegoraroF10. To fix, teach the planner to check that the LHS and RHS contain the right things, or more accurately don't contain the wrong things. Given that this has been broken for years without anyone noticing, it seems sufficient to just give up hashing when it happens, rather than go to the trouble of commuting the clauses back again (which wouldn't necessarily work anyway). While poking at that, I also noticed that nodeSubplan.c had a baked-in assumption that the number of hash clauses is identical to the number of subquery output columns. Again, that's fine as far as parser output goes, but it's not hard to break it via function inlining. There seems little reason for that assumption though --- AFAICS, the only thing it's buying us is not having to store the number of hash clauses explicitly. Adding code to the planner to reject such cases would take more code than getting nodeSubplan.c to cope, so I fixed it that way. This has been broken for as long as we've had hashable SubPlans, so back-patch to all supported branches. Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
2020-08-14Fix postmaster's behavior during smart shutdown.Tom Lane
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the postmaster has immediately killed all "optional" background processes, and subsequently refused to launch new ones while it's waiting for foreground client processes to exit. No doubt this seemed like an OK policy at some point; but it's a pretty bad one now, because it makes for a seriously degraded environment for the remaining clients: * Parallel queries are killed, and new ones fail to launch. (And our parallel-query infrastructure utterly fails to deal with the case in a reasonable way --- it just hangs waiting for workers that are not going to arrive. There is more work needed in that area IMO.) * Autovacuum ceases to function. We can tolerate that for awhile, but if bulk-update queries continue to run in the surviving client sessions, there's eventually going to be a mess. In the worst case the system could reach a forced shutdown to prevent XID wraparound. * The bgwriter and walwriter are also stopped immediately, likely resulting in performance degradation. Hence, let's rearrange things so that the only immediate change in behavior is refusing to let in new normal connections. Once the last normal connection is gone, shut everything down as though we'd received a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY while normal connections remain. A subsidiary state variable tracks whether or not we're letting in new connections in those states. This also allows having just one copy of the logic for killing child processes in smart and fast shutdown modes. I moved that logic into PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS. Back-patch to 9.6 where parallel query was added. In principle this'd be a good idea in 9.5 as well, but the risk/reward ratio is not as good there, since lack of autovacuum is not a problem during typical uses of smart shutdown. Per report from Bharath Rupireddy. Patch by me, reviewed by Thomas Munro Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
2020-08-13Handle new HOT chains in index-build table scansAlvaro Herrera
When a table is scanned by heapam_index_build_range_scan (née IndexBuildHeapScan) and the table lock being held allows concurrent data changes, it is possible for new HOT chains to sprout in a page that were unknown when the scan of a page happened. This leads to an error such as ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl" because the root tuple was not present when we first obtained the list of the page's root tuples. This can be fixed by re-obtaining the list of root tuples, if we see that a heap-only tuple appears to point to a non-existing root. This was reported by Anastasia as occurring for BRIN summarization (which exists since 9.5), but I think it could theoretically also happen with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY (very recent). It seems a happy coincidence that BRIN forces us to backpatch this all the way to 9.5. Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru Backpatch: 9.5 - master
2020-08-12BRIN: Handle concurrent desummarization properlyAlvaro Herrera
If a page range is desummarized at just the right time concurrently with an index walk, BRIN would raise an error indicating index corruption. This is scary and unhelpful; silently returning that the page range is not summarized is sufficient reaction. This bug was introduced by commit 975ad4e602ff as additional protection against a bug whose actual fix was elsewhere. Backpatch equally. Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru Backpatch: 9.5 - master
2020-08-10Stamp 9.6.19.REL9_6_19Tom Lane
2020-08-10Make contrib modules' installation scripts more secure.Tom Lane
Hostile objects located within the installation-time search_path could capture references in an extension's installation or upgrade script. If the extension is being installed with superuser privileges, this opens the door to privilege escalation. While such hazards have existed all along, their urgency increases with the v13 "trusted extensions" feature, because that lets a non-superuser control the installation path for a superuser-privileged script. Therefore, make a number of changes to make such situations more secure: * Tweak the construction of the installation-time search_path to ensure that references to objects in pg_catalog can't be subverted; and explicitly add pg_temp to the end of the path to prevent attacks using temporary objects. * Disable check_function_bodies within installation/upgrade scripts, so that any security gaps in SQL-language or PL-language function bodies cannot create a risk of unwanted installation-time code execution. * Adjust lookup of type input/receive functions and join estimator functions to complain if there are multiple candidate functions. This prevents capture of references to functions whose signature is not the first one checked; and it's arguably more user-friendly anyway. * Modify various contrib upgrade scripts to ensure that catalog modification queries are executed with secure search paths. (These are in-place modifications with no extension version changes, since it is the update process itself that is at issue, not the end result.) Extensions that depend on other extensions cannot be made fully secure by these methods alone; therefore, revert the "trusted" marking that commit eb67623c9 applied to earthdistance and hstore_plperl, pending some better solution to that set of issues. Also add documentation around these issues, to help extension authors write secure installation scripts. Patch by me, following an observation by Andres Freund; thanks to Noah Misch for review. Security: CVE-2020-14350
2020-08-10Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: f7a31187f2c0d0988e21cb6f3d788957c5d59fd8
2020-08-09Check for fseeko() failure in pg_dump's _tarAddFile().Tom Lane
Coverity pointed out, not unreasonably, that we checked fseeko's result at every other call site but these. Failure to seek in the temp file (note this is NOT pg_dump's output file) seems quite unlikely, and even if it did happen the file length cross-check further down would probably detect the problem. Still, that's a poor excuse for not checking the result of a system call.
2020-08-08walsnd: Don't set waiting_for_ping_response spuriouslyAlvaro Herrera
Ashutosh Bapat noticed that when logical walsender needs to wait for WAL, and it realizes that it must send a keepalive message to walreceiver to update the sent-LSN, which *does not* request a reply from walreceiver, it wrongly sets the flag that it's going to wait for that reply. That means that any future would-be sender of feedback messages ends up not sending a feedback message, because they all believe that a reply is expected. With built-in logical replication there's not much harm in this, because WalReceiverMain will send a ping-back every wal_receiver_timeout/2 anyway; but with other logical replication systems (e.g. pglogical) it can cause significant pain. This problem was introduced in commit 41d5f8ad734, where the request-reply flag was changed from true to false to WalSndKeepalive, without at the same time removing the line that sets waiting_for_ping_response. Just removing that line would be a sufficient fix, but it seems better to shift the responsibility of setting the flag to WalSndKeepalive itself instead of requiring caller to do it; this is clearly less error-prone. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Backpatch: 9.5 and up Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
2020-08-04Increase hard-wired timeout values in ecpg regression tests.Tom Lane
A couple of test cases had connect_timeout=14, a value that seems to have been plucked from a hat. While it's more than sufficient for normal cases, slow/overloaded buildfarm machines can get a timeout failure here, as per recent report from "sungazer". Increase to 180 seconds, which is in line with our typical timeouts elsewhere in the regression tests. Back-patch to 9.6; the code looks different in 9.5, and this doesn't seem to be quite worth the effort to adapt to that. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-08-04%2007%3A12%3A22
2020-07-31Fix recently-introduced performance problem in ts_headline().Tom Lane
The new hlCover() algorithm that I introduced in commit c9b0c678d turns out to potentially take O(N^2) or worse time on long documents, if there are many occurrences of individual query words but few or no substrings that actually satisfy the query. (One way to hit this behavior is with a "common_word & rare_word" type of query.) This seems unavoidable given the original goal of checking every substring of the document, so we have to back off that idea. Fortunately, it seems unlikely that anyone would really want headlines spanning all of a long document, so we can avoid the worse-than-linear behavior by imposing a maximum length of substring that we'll consider. For now, just hard-wire that maximum length as a multiple of max_words times max_fragments. Perhaps at some point somebody will argue for exposing it as a ts_headline parameter, but I'm hesitant to make such a feature addition in a back-patched bug fix. I also noted that the hlFirstIndex() function I'd added in that commit was unnecessarily stupid: it really only needs to check whether a HeadlineWordEntry's item pointer is null or not. This wouldn't make all that much difference in typical cases with queries having just a few terms, but a cycle shaved is a cycle earned. In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse. This ensures that hlCover's loop is cancellable if it manages to take a long time, and it may protect some other TS_execute callers as well. Back-patch to 9.6 as the previous commit was. I also chose to add the CHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seems to avoid the O(N^2) behavior, at least on the test case I tried, but nonetheless it's not very quick on a long document. Per report from Stephen Frost. Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
2020-07-29Backpatch tuplesort.c assertion.Peter Geoghegan
Backpatch an assertion (that was originally added to Postgres 12 by commit dd299df8189) that seems broadly useful. The assertion can detect violations of the HOT invariant (i.e. no two index tuples can point to the same heap TID) when CREATE INDEX somehow incorrectly allows that to take place. For example, a IndexBuildHeapScan/heapam_index_build_range_scan bug might result in two tuples that both point to the same heap TID. If these two tuples also happen to be duplicates, the assertion will fail. Discussion: https://postgr.es/m/CAH2-WzmBxu4o=pMsniur+bwHqCGCmV_AOLkuK6BuU7ngA6evqw@mail.gmail.com Backpatch: 9.5-11 only
2020-07-23Fix error message.Thomas Munro
Remove extra space. Back-patch to all releases, like commit 7897e3bb. Author: Lu, Chenyang <lucy.fnst@cn.fujitsu.com> Discussion: https://postgr.es/m/795d03c6129844d3803e7eea48f5af0d%40G08CNEXMBPEKD04.g08.fujitsu.local
2020-07-17Ensure that distributed timezone abbreviation files are plain ASCII.Tom Lane
We had two occurrences of "Mitteleuropäische Zeit" in Europe.txt, though the corresponding entries in Default were spelled "Mitteleuropaeische Zeit". Standardize on the latter spelling to avoid questions of which encoding to use. While here, correct a couple of other trivial inconsistencies between the Default file and the supposedly-matching entries in the *.txt files, as exposed by some checking with comm(1). Also, add BDST to the Europe.txt file; it previously was only listed in Default. None of this has any direct functional effect. Per complaint from Christoph Berg. As usual for timezone data patches, apply to all branches. Discussion: https://postgr.es/m/20200716100743.GE3534683@msg.df7cb.de
2020-07-16Switch pg_test_fsync to use binary mode on WindowsMichael Paquier
pg_test_fsync has always opened files using the text mode on Windows, as this is the default mode used if not enforced by _setmode(). This fixes a failure when running pg_test_fsync down to 12 because O_DSYNC and the text mode are not able to work together nicely. We fixed the handling of O_DSYNC in 12~ for the tool by switching to the concurrent-safe version of fopen() in src/port/ with 0ba06e0. And 40cfe86, by enforcing the text mode for compatibility reasons if O_TEXT or O_BINARY are not specified by the caller, broke pg_test_fsync. For all versions, this avoids any translation overhead, and pg_test_fsync should test binary writes, so it is a gain in all cases. Note that O_DSYNC is still not handled correctly in ~11, leading to pg_test_fsync to show insanely high numbers for open_datasync() (using this property it is easy to notice that the binary mode is much faster). This would require a backpatch of 0ba06e0 and 40cfe86, which could potentially break existing applications, so this is left out. There are no TAP tests for this tool yet, so I have checked all builds manually using MSVC. We could invent a new option to run a single transaction instead of using a duration of 1s to make the tests a maximum short, but this is left as future work. Thanks to Bruce Momjian for the discussion. Reported-by: Jeff Janes Author: Michael Paquier Discussion: https://postgr.es/m/16526-279ded30a230d275@postgresql.org Backpatch-through: 9.5
2020-07-15Replace use of sys_siglist[] with strsignal().Tom Lane
This commit back-patches the v12-era commits a73d08319, cc92cca43, and 7570df0f3 into supported pre-v12 branches. The net effect is to eliminate our former dependency on the never-standard sys_siglist[] array, instead using POSIX-standard strsignal(3). What motivates doing this now is that glibc just removed sys_siglist[] from the set of symbols available to newly-built programs. While our code can survive without sys_siglist[], it then fails to print any description of the signal that killed a child process, which is a non-negligible loss of friendliness. We can expect that people will be wanting to build the back branches on platforms that include this change, so we need to do something. Since strsignal(3) has existed for quite a long time, and we've not had any trouble with these patches so far in v12, it seems safe to back-patch into older branches. Discussion: https://postgr.es/m/3179114.1594853308@sss.pgh.pa.us
2020-07-15Fix handling of missing files when using pg_rewind with online sourceMichael Paquier
When working with an online source cluster, pg_rewind gets a list of all the files in the source data directory using a WITH RECURSIVE query, returning a NULL result for a file's metadata if it gets removed between the moment it is listed in a directory and the moment its metadata is obtained with pg_stat_file() (say a recycled WAL segment). The query result was processed in such a way that for each tuple we checked only that the first file's metadata was NULL. This could have two consequences, both resulting in a failure of the rewind: - If the first tuple referred to a removed file, all files from the source would be ignored. - Any file actually missing would not be considered as such. While on it, rework slightly the code so as no values are saved if we know that a file is going to be skipped. Issue introduced by b36805f, so backpatch down to 9.5. Author: Justin Pryzby, Michael Paquier Reviewed-by: Daniel Gustafsson, Masahiko Sawada Discussion: https://postgr.es/m/20200713061010.GC23581@telsasoft.com Backpatch-through: 9.5
2020-07-14Fix timing issue with ALTER TABLE's validate constraintDavid Rowley
An ALTER TABLE to validate a foreign key in which another subcommand already caused a pending table rewrite could fail due to ALTER TABLE attempting to validate the foreign key before the actual table rewrite takes place. This situation could result in an error such as: ERROR: could not read block 0 in file "base/nnnnn/nnnnn": read only 0 of 8192 bytes The failure here was due to the SPI call which validates the foreign key trying to access an index which is yet to be rebuilt. Similarly, we also incorrectly tried to validate CHECK constraints before the heap had been rewritten. The fix for both is to delay constraint validation until phase 3, after the table has been rewritten. For CHECK constraints this means a slight behavioral change. Previously ALTER TABLE VALIDATE CONSTRAINT on inheritance tables would be validated from the bottom up. This was different from the order of evaluation when a new CHECK constraint was added. The changes made here aligns the VALIDATE CONSTRAINT evaluation order for inheritance tables to be the same as ADD CONSTRAINT, which is generally top-down. Reported-by: Nazli Ugur Koyluoglu, using SQLancer Discussion: https://postgr.es/m/CAApHDvp%3DZXv8wiRyk_0rWr00skhGkt8vXDrHJYXRMft3TjkxCA%40mail.gmail.com Backpatch-through: 9.5 (all supported versions)
2020-07-13Cope with lateral references in the quals of a subquery RTE.Tom Lane
The qual pushdown logic assumed that all Vars in a restriction clause must be Vars referencing subquery outputs; but since we introduced LATERAL, it's possible for such a Var to be a lateral reference instead. This led to an assertion failure in debug builds. In a non-debug build, there might be no ill effects (if qual_is_pushdown_safe decided the qual was unsafe anyway), or we could get failures later due to construction of an invalid plan. I've not gone to much length to characterize the possible failures, but at least segfaults in the executor have been observed. Given that this has been busted since 9.3 and it took this long for anybody to notice, I judge that the case isn't worth going to great lengths to optimize. Hence, fix by just teaching qual_is_pushdown_safe that such quals are unsafe to push down, matching the previous behavior when it accidentally didn't fail. Per report from Tom Ellis. Back-patch to all supported branches. Discussion: https://postgr.es/m/20200713175124.GQ8220@cloudinit-builder
2020-07-11Avoid trying to restore table ACLs and per-column ACLs in parallel.Tom Lane
Parallel pg_restore has always supposed that ACL items for different objects are independent and can be restored in parallel without conflicts. However, there is one case where this fails: because REVOKE on a table is defined to also revoke the privilege(s) at column level, we can't restore per-column ACLs till after we restore any table-level privileges on their table. Failure to honor this restriction can lead to "tuple concurrently updated" errors during parallel restore, or even to the per-column ACLs silently disappearing because the table-level REVOKE is executed afterwards. To fix, add a dependency from each column-level ACL item to its table's ACL item, if there is one. Note that this doesn't fix the hazard for pre-existing archive files, only for ones made with a corrected pg_dump. Given that the bug's been there quite awhile without field reports, I think this is acceptable. This requires changing the API of pg_dump's dumpACL() function. To keep its argument list from getting even longer, I removed the "CatalogId objCatId" argument, which has been unused for ages. Per report from Justin Pryzby. Back-patch to all supported branches. Discussion: https://postgr.es/m/20200706050129.GW4107@telsasoft.com
2020-07-09Tighten up Windows CRLF conversion in our TAP test scripts.Tom Lane
Back-patch commits 91bdf499b and ffb4cee43, so that all branches agree on when and how to do Windows CRLF conversion. This should close the referenced thread. Thanks to Andrew Dunstan for discussion/review. Discussion: https://postgr.es/m/412ae8da-76bb-640f-039a-f3513499e53d@gmx.net
2020-07-03Clamp total-tuples estimates for foreign tables to ensure planner sanity.Tom Lane
After running GetForeignRelSize for a foreign table, adjust rel->tuples to be at least as large as rel->rows. This prevents bizarre behavior in estimate_num_groups() and perhaps other places, especially in the scenario where rel->tuples is zero because pg_class.reltuples is (suggesting that ANALYZE has never been run for the table). As things stood, we'd end up estimating one group out of any GROUP BY on such a table, whereas the default group-count estimate is more likely to result in a sane plan. Also, clarify in the documentation that GetForeignRelSize has the option to override the rel->tuples value if it has a better idea of what to use than what is in pg_class.reltuples. Per report from Jeff Janes. Back-patch to all supported branches. Patch by me; thanks to Etsuro Fujita for review Discussion: https://postgr.es/m/CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com
2020-07-03Fix temporary tablespaces for shared filesets some more.Tom Lane
Commit ecd9e9f0b fixed the problem in the wrong place, causing unwanted side-effects on the behavior of GetNextTempTableSpace(). Instead, let's make SharedFileSetInit() responsible for subbing in the value of MyDatabaseTableSpace when the default tablespace is called for. The convention about what is in the tempTableSpaces[] array is evidently insufficiently documented, so try to improve that. It also looks like SharedFileSetInit() is doing the wrong thing in the case where temp_tablespaces is empty. It was hard-wiring use of the pg_default tablespace, but it seems like using MyDatabaseTableSpace is more consistent with what happens for other temp files. Back-patch the reversion of PrepareTempTablespaces()'s behavior to 9.5, as ecd9e9f0b was. The changes in SharedFileSetInit() go back to v11 where that was introduced. (Note there is net zero code change before v11 from these two patch sets, so nothing to release-note.) Magnus Hagander and Tom Lane Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com
2020-07-03Fix temporary tablespaces for shared filesetsMagnus Hagander
A likely copy/paste error in 98e8b480532 from back in 2004 would cause temp tablespace to be reset to InvalidOid if temp_tablespaces was set to the same value as the primary tablespace in the database. This would cause shared filesets (such as for parallel hash joins) to ignore them, putting the temporary files in the default tablespace instead of the configured one. The bug is in the old code, but it appears to have been exposed only once we had shared filesets. Reviewed-By: Daniel Gustafsson Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com Backpatch-through: 9.5
2020-06-24Fix compiler warning induced by commit d8b15eeb8.Tom Lane
I forgot that INT64_FORMAT can't be used with sscanf on Windows. Use the same trick of sscanf'ing into a temp variable as we do in some other places in zic.c. The upstream IANA code avoids the portability problem by relying on <inttypes.h>'s SCNdFAST64 macro. Once we're requiring C99 in all branches, we should do likewise and drop this set of diffs from upstream. For now, though, a hack seems fine, since we do not actually care about leapseconds anyway. Discussion: https://postgr.es/m/4e5d1a5b-143e-e70e-a99d-a3b01c1ae7c3@2ndquadrant.com
2020-06-22Undo double-quoting of index names in non-text EXPLAIN output formats.Tom Lane
explain_get_index_name() applied quote_identifier() to the index name. This is fine for text output, but the non-text output formats all have their own quoting conventions and would much rather start from the actual index name. For example in JSON you'd get something like "Index Name": "\"My Index\"", which is surely not desirable, especially when the same does not happen for table names. Hence, move the responsibility for applying quoting out to the callers, where it can go into already-existing special code paths for text format. This changes the API spec for users of explain_get_index_name_hook: before, they were supposed to apply quote_identifier() if necessary, now they should not. Research suggests that the only publicly available user of the hook is hypopg, and it actually forgot to apply quoting anyway, so it's fine. (In any case, there's no behavioral change for the output of a hook as seen in non-text EXPLAIN formats, so this won't break any case that programs should be relying on.) Digging in the commit logs, it appears that quoting was included in explain_get_index_name's duties when commit 604ffd280 invented it; and that was fine at the time because we only had text output format. This should have been rethought when non-text formats were invented, but it wasn't. This is a fairly clear bug for users of non-text EXPLAIN formats, so back-patch to all supported branches. Per bug #16502 from Maciek Sakrejda. Patch by me (based on investigation by Euler Taveira); thanks to Julien Rouhaud for review. Discussion: https://postgr.es/m/16502-57bd1c9f913ed1d1@postgresql.org
2020-06-19Ensure write failure reports no-disk-spaceAlvaro Herrera
A few places calling fwrite and gzwrite were not setting errno to ENOSPC when reporting errors, as is customary; this led to some failures being reported as "could not write file: Success" which makes us look silly. Make a few of these places in pg_dump and pg_basebackup use our customary pattern. Backpatch-to: 9.5 Author: Justin Pryzby <pryzby@telsasoft.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200611153753.GU14879@telsasoft.com
2020-06-19Future-proof regression tests against possibly-missing posixrules file.Tom Lane
The IANA time zone folk have deprecated use of a "posixrules" file in the tz database. While for now it's our choice whether to keep supplying one in our own builds, installations built with --with-system-tzdata will soon be needing to cope with that file not being present, at least on some platforms. This causes a problem for the horology test, which expected the nonstandard POSIX zone spec "CST7CDT" to apply pre-2007 US daylight savings rules. That does happen if the posixrules file supplies such information, but otherwise the test produces undesired results. To fix, add an explicit transition date rule that matches 2005 practice. (We could alternatively have switched the test to use some real time zone, but it seems useful to have coverage of this type of zone spec.) While at it, update a documentation example that also relied on "CST7CDT"; use a real-world zone name instead. Also, document why the zone names EST5EDT, CST6CDT, MST7MDT, PST8PDT aren't subject to similar failures when "posixrules" is missing. Back-patch to all supported branches, since the hazard is the same for all. Discussion: https://postgr.es/m/1665379.1592581287@sss.pgh.pa.us
2020-06-18Fix C99isms introduced when backpatching atomics / spinlock tests.Andres Freund
2020-06-18Fix deadlock danger when atomic ops are done under spinlock.Andres Freund
This was a danger only for --disable-spinlocks in combination with atomic operations unsupported by the current platform. While atomics.c was careful to signal that a separate semaphore ought to be used when spinlock emulation is active, spin.c didn't actually implement that mechanism. That's my (Andres') fault, it seems to have gotten lost during the development of the atomic operations support. Fix that issue and add test for nesting atomic operations inside a spinlock. Author: Andres Freund Discussion: https://postgr.es/m/20200605023302.g6v3ydozy5txifji@alap3.anarazel.de Backpatch: 9.5-
2020-06-18Add basic spinlock tests to regression tests.Andres Freund
As s_lock_test, the already existing test for spinlocks, isn't run in an automated fashion (and doesn't test a normal backend environment), adding tests that are run as part of a normal regression run is a good idea. Particularly in light of several recent and upcoming spinlock related fixes. Currently the new tests are run as part of the pre-existing test_atomic_ops() test. That perhaps can be quibbled about, but for now seems ok. The only operations that s_lock_test tests but the new tests don't are the detection of a stuck spinlock and S_LOCK_FREE (which is otherwise unused, not implemented on all platforms, and will be removed). This currently contains a test for more than INT_MAX spinlocks (only run with --disable-spinlocks), to ensure the recent commit fixing a bug with more than INT_MAX spinlock initializations is correct. That test is somewhat slow, so we might want to disable it after a few days. It might be worth retiring s_lock_test after this. The added coverage of a stuck spinlock probably isn't worth the added complexity? Author: Andres Freund Discussion: https://postgr.es/m/20200606023103.avzrctgv7476xj7i@alap3.anarazel.de
2020-06-17Sync our copy of the timezone library with IANA release tzcode2020a.Tom Lane
This absorbs a leap-second-related bug fix in localtime.c, and teaches zic to handle an expiration marker in the leapseconds file. Neither are of any interest to us (for the foreseeable future anyway), but we need to stay more or less in sync with upstream. Also adjust some over-eager changes in the README from commit 957338418. I have no intention of making changes that require C99 in this code, until such time as all the live back branches require C99. Otherwise back-patching will get too exciting. For the same reason, absorb assorted whitespace and other cosmetic changes from HEAD into the back branches; mostly this reflects use of improved versions of pgindent. All in all then, quite a boring update. But I figured I'd get it done while I was looking at this code.
2020-06-17spinlock emulation: Fix bug when more than INT_MAX spinlocks are initialized.Andres Freund
Once the counter goes negative we ended up with spinlocks that errored out on first use (due to check in tas_sema). Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20200606023103.avzrctgv7476xj7i@alap3.anarazel.de Backpatch: 9.5-
2020-06-16Fix buffile.c error handling.Thomas Munro
Convert buffile.c error handling to use ereport. This fixes cases where I/O errors were indistinguishable from EOF or not reported. Also remove "%m" from error messages where errno would be bogus. While we're modifying those strings, add block numbers and short read byte counts where appropriate. Back-patch to all supported releases. Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
2020-06-15pg_upgrade: set vacuum_defer_cleanup_age to zeroBruce Momjian
Non-zero vacuum_defer_cleanup_age values cause pg_upgrade freezing of the system catalogs to be incomplete, or do nothing. This will cause the upgrade to fail in confusing ways. Reported-by: Laurenz Albe Discussion: https://postgr.es/m/7d6f6c22ba05ce0c526e9e8b7bfa8105e7da45e6.camel@cybertec.at Backpatch-through: 9.5
2020-06-11Fix mishandling of NaN counts in numeric_[avg_]combine.Tom Lane
When merging two NumericAggStates, the code missed adding the new state's NaNcount unless its N was also nonzero; since those counts are independent, this is wrong. This would only have visible effect if some partial aggregate scans found only NaNs while earlier ones found only non-NaNs; then we could end up falsely deciding that there were no NaNs and fail to return a NaN final result as expected. That's pretty improbable, so it's no surprise this hasn't been reported from the field. Still, it's a bug. I didn't try to produce a regression test that would show the bug, but I did notice that these functions weren't being reached at all in our regression tests, so I improved the tests to at least exercise them. With these additions, I see pretty complete code coverage on the aggregation-related functions in numeric.c. Back-patch to 9.6 where this code was introduced. (I only added the improved test case as far back as v10, though, since the relevant part of aggregates.sql isn't there at all in 9.6.)
2020-06-11Avoid update conflict out serialization anomalies.Peter Geoghegan
SSI's HeapCheckForSerializableConflictOut() test failed to correctly handle conditions involving a concurrently inserted tuple which is later concurrently updated by a separate transaction . A SELECT statement that called HeapCheckForSerializableConflictOut() could end up using the same XID (updater's XID) for both the original tuple, and the successor tuple, missing the XID of the xact that created the original tuple entirely. This only happened when neither tuple from the chain was visible to the transaction's MVCC snapshot. The observable symptoms of this bug were subtle. A pair of transactions could commit, with the later transaction failing to observe the effects of the earlier transaction (because of the confusion created by the update to the non-visible row). This bug dates all the way back to commit dafaa3ef, which added SSI. To fix, make sure that we check the xmin of concurrently inserted tuples that happen to also have been updated concurrently. Author: Peter Geoghegan Reported-By: Kyle Kingsbury Reviewed-By: Thomas Munro Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io Backpatch: All supported versions
2020-06-11Fix typos.Amit Kapila
Reported-by: John Naylor Author: John Naylor Backpatch-through: 9.5 Discussion: https://postgr.es/m/CACPNZCtRuvs6G+EYqejhVJgBq2AKeZdXRVJsbX4syhO9gn5SNQ@mail.gmail.com
2020-06-08Avoid need for valgrind suppressions for pg_atomic_init_u64 on some platforms.Andres Freund
Previously we used pg_atomic_write_64_impl inside pg_atomic_init_u64. That works correctly, but on platforms without 64bit single copy atomicity it could trigger spurious valgrind errors about uninitialized memory, because we use compare_and_swap for atomic writes on such platforms. I previously suppressed one instance of this problem (6c878edc1df), but as Tom reports that wasn't enough. As the atomic variable cannot yet be concurrently accessible during initialization, it seems better to have pg_atomic_init_64_impl set the value directly. Change pg_atomic_init_u32_impl for symmetry. Reported-By: Tom Lane Author: Andres Freund Discussion: https://postgr.es/m/1714601.1591503815@sss.pgh.pa.us Backpatch: 9.5-
2020-06-08Fix locking bugs that could corrupt pg_control.Thomas Munro
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire ControlFileLock before modifying ControlFile->checkPointCopy, or the checkpointer could write out a control file with a bad checksum. Likewise, XLogReportParameters() must acquire ControlFileLock before modifying ControlFile and calling UpdateControlFile(). Back-patch to all supported releases. Author: Nathan Bossart <bossartn@amazon.com> Author: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
2020-06-07MSVC: Avoid warning when testing a TAP suite without PROVE_FLAGS.Noah Misch
Commit 7be5d8df1f74b78620167d3abf32ee607e728919 surfaced the logic error, which had no functional implications, by adding "use warnings". The buildfarm always customizes PROVE_FLAGS, so the warning did not appear there. Back-patch to 9.5 (all supported versions).
2020-06-05Refresh function name in CRC-associated Valgrind suppressions.Noah Misch
Back-patch to 9.5, where commit 4f700bcd20c087f60346cb8aefd0e269be8e2157 first appeared. Reviewed by Tom Lane. Reported by Andrew Dunstan. Discussion: https://postgr.es/m/4dfabec2-a3ad-0546-2d62-f816c97edd0c@2ndQuadrant.com
2020-06-04Fix instance of elog() called while holding a spinlockMichael Paquier
This broke the project rule to not call any complex code while a spinlock is held. Issue introduced by b89e151. Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com Backpatch-through: 9.5
2020-06-01Fix use-after-release mistake in currtid() and currtid2() for viewsMichael Paquier
This issue has been present since the introduction of this code as of a3519a2 from 2002, and has been found by buildfarm member prion that uses RELCACHE_FORCE_RELEASE via the tests introduced recently in e786be5. Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz Backpatch-through: 9.5
2020-05-31Make install-tests target work with vpath buildsAndrew Dunstan
Also add a top-level install-tests target. Backpatch to all live branches. Craig Ringer, tweaked by me.
2020-05-28Add CHECK_FOR_INTERRUPTS() to the repeat() functionJoe Conway
The repeat() function loops for potentially a long time without ever checking for interrupts. This prevents, for example, a query cancel from interrupting until the work is all done. Fix by inserting a CHECK_FOR_INTERRUPTS() into the loop. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
2020-05-21Fix MSVC installations with multiple "configure" files detectedMichael Paquier
When installing binaries and libraries using the MSVC installation routines, the operation gets done after moving to the root folder, whose location is detected by checking if "configure" exists two times in a row. So, calling the installation script from src/tools/msvc/ with an extra "configure" file four levels up the root path of the code tree causes the execution to go further up, leading to a failure in finding the builds. This commit fixes the issue by moving to the root folder of the code tree only once, when necessary. Author: Arnold Müller Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/16343-f638f67e7e52b86c@postgresql.org Backpatch-through: 9.5
2020-05-18Fix comment in slot.c.Amit Kapila
Reported-by: Sawada Masahiko Author: Sawada Masahiko Reviewed-by: Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k4Ws7M7YQ8PqSym5WB1y75dZeBTd1sZJUQdfe0KJQ-iSA@mail.gmail.com