summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2021-07-20Fix some issues with WAL segment opening for pg_receivewal --compressMichael Paquier
The logic handling the opening of new WAL segments was fuzzy when using --compress if a partial, non-compressed, segment with the same base name existed in the repository storing those files. In this case, using --compress would cause the code to first check for the existence and the size of a non-compressed segment, followed by the opening of a new compressed, partial, segment. The code was accidentally working correctly on most platforms as the buildfarm has proved, except bowerbird where gzflush() could fail in this code path. It is wrong anyway to take the code path used pre-padding when creating a new partial, non-compressed, segment, so let's fix it. Note that this issue exists when users mix successive runs of pg_receivewal with or without compression, as discovered with the tests introduced by ffc9dda. While on it, this refactors the code so as code paths that need to know about the ".gz" suffix are down from four to one in walmethods.c, easing a bit the introduction of new compression methods. This addresses a second issue where log messages generated for an unexpected failure would not show the compressed segment name involved, which was confusing, printing instead the name of the non-compressed equivalent. Reported-by: Georgios Kokolatos Discussion: https://postgr.es/m/YPDLz2x3o1aX2wRh@paquier.xyz Backpatch-through: 10
2021-07-19Make new replication slot test code even less racyAlvaro Herrera
Further fix the test code in ead9e51e8236, this time by waiting until the checkpoint has completed before moving on; this ensures that the WAL segment removal has already happened when we create the next slot. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20210719.111318.2042379313472032754.horikyota.ntt@gmail.com
2021-07-19Don't allow to set replication slot_name as ''.Amit Kapila
We don't allow to create replication slot_name as an empty string ('') via SQL API pg_create_logical_replication_slot() but it is allowed to be set via Alter Subscription command. This will lead to apply worker repeatedly keep trying to stream data via slot_name '' and the user is not allowed to create the slot with that name. Author: Japin Li Reviewed-By: Ranier Vilela, Amit Kapila Backpatch-through: 10, where it was introduced Discussion: https://postgr.es/m/MEYP282MB1669CBD98E721C77CA696499B61A9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-07-17Make new replication slot test code less racyAlvaro Herrera
The new test code added in ead9e51e8236 is racy -- it hinges on shared-memory state, which changes before the WARNING message is logged. Put it the other way around. Backpatch to 13. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202107161809.zclasccpfcg3@alvherre.pgsql
2021-07-16Fix pg_dump for disabled triggers on partitioned tablesAlvaro Herrera
pg_dump failed to preserve the 'enabled' flag (which can be not only disabled, but also REPLICA or ALWAYS) for partitions which had it changed from their respective parents. Attempt to handle that by including a definition for such triggers in the dump, but replace the standard CREATE TRIGGER line with an ALTER TRIGGER line. Backpatch to 11, where these triggers can exist. In branches 11 and 12, pick up a few test lines from commit b9b408c48724 to verify that pg_upgrade is okay with these arrangements. Co-authored-by: Justin Pryzby <pryzby@telsasoft.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200930223450.GA14848@telsasoft.com
2021-07-16Preserve firing-on state when cloning row triggers to partitionsAlvaro Herrera
When triggers are cloned from partitioned tables to their partitions, the 'tgenabled' flag (origin/replica/always/disable) was not propagated. Make it so that the flag on the trigger on partition is initially set to the same value as on the partitioned table. Add a test case to verify the behavior. Backpatch to 11, where this appeared in commit 86f575948c77. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200930223450.GA14848@telsasoft.com
2021-07-16Advance old-segment horizon properly after slot invalidationAlvaro Herrera
When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
2021-07-15Ensure HAVE_DECL_XXX macros in MSVC builds match those in Unix.Tom Lane
Autoconf's AC_CHECK_DECLS() always defines HAVE_DECL_whatever as 1 or 0, but some of the entries in msvc/Solution.pm showed such symbols as "undef" instead of 0. Fix that for consistency. There's no live bug in current usages AFAICS, but it's not hard to imagine one creeping in if more-complex #if tests get added. Back-patch to v13, which is as far back as Solution.pm contains this data. The inconsistency still exists in the manually-filled pg_config_ext.h.win32 files of older branches; but as long as the problem is only latent, it doesn't seem worth the trouble to clean things up there. Discussion: https://postgr.es/m/3185430.1626133592@sss.pgh.pa.us
2021-07-14Fix unexpected error messages for various flavors of ALTER TABLEMichael Paquier
Some commands of ALTER TABLE could fail with the following error: ERROR: "tab" is of the wrong type This error is unexpected, as all the code paths leading to ATWrongRelkindError() should use a supported set of relkinds to generate correct error messages. This commit closes the gap with such mistakes, by adding all the missing relkind combinations. Tests are added to check all the problems found. Note that some combinations are not used, but these are left around as it could have an impact on applications relying on this code. 2ed532e has done a much larger refactoring on HEAD to make such error messages easier to manage in the long-term, so nothing is needed there. Author: Kyotaro Horiguchi Reviewed-by: Peter Eisentraut, Ahsan Hadi, Michael Paquier Discussion: https://postgr.es/m/20210216.181415.368926598204753659.horikyota.ntt@gmail.com Backpatch-through: 11
2021-07-13Robustify tuplesort's free_sort_tuple functionDavid Rowley
41469253e went to the trouble of removing a theoretical bug from free_sort_tuple by checking if the tuple was NULL before freeing it. Let's make this a little more robust by also setting the tuple to NULL so that should we be called again we won't end up doing a pfree on the already pfree'd tuple. Per advice from Tom Lane. Discussion: https://postgr.es/m/3188192.1626136953@sss.pgh.pa.us Backpatch-through: 9.6, same as 41469253e
2021-07-13Fix theoretical bug in tuplesortDavid Rowley
This fixes a theoretical bug in tuplesort.c which, if a bounded sort was used in combination with a byval Datum sort (tuplesort_begin_datum), when switching the sort to a bounded heap in make_bounded_heap(), we'd call free_sort_tuple(). The problem was that when sorting Datums of a byval type, the tuple is NULL and free_sort_tuple() would free the memory for it regardless of that. This would result in a crash. Here we fix that simply by adding a check to see if the tuple is NULL before trying to disassociate and free any memory belonging to it. The reason this bug is only theoretical is that nowhere in the current code base do we do tuplesort_set_bound() when performing a Datum sort. However, let's backpatch a fix for this as if any extension uses the code in this way then it's likely to cause problems. Author: Ronan Dunklau Discussion: https://postgr.es/m/CAApHDvpdoqNC5FjDb3KUTSMs5dg6f+XxH4Bg_dVcLi8UYAG3EQ@mail.gmail.com Backpatch-through: 9.6, oldest supported version
2021-07-12Remove dead assignment to local variable.Heikki Linnakangas
This should have been removed in commit 7e30c186da, which split the loop into two. Only the first loop uses the 'from' variable; updating it in the second loop is bogus. It was never read after the first loop, so this was harmless and surely optimized away by the compiler, but let's be tidy. Backpatch to all supported versions. Author: Ranier Vilela Discussion: https://www.postgresql.org/message-id/CAEudQAoWq%2BAL3BnELHu7gms2GN07k-np6yLbukGaxJ1vY-zeiQ%40mail.gmail.com
2021-07-11Lock the extension during ALTER EXTENSION ADD/DROP.Tom Lane
Although we were careful to lock the object being added or dropped, we failed to get any sort of lock on the extension itself. This allowed the ALTER to proceed in parallel with a DROP EXTENSION, which is problematic for a couple of reasons. If both commands succeeded we'd be left with a dangling link in pg_depend, which would cause problems later. Also, if the ALTER failed for some reason, it might try to print the extension's name, and that could result in a crash or (in older branches) a silly error message complaining about extension "(null)". Per bug #17098 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17098-b960f3616c861f83@postgresql.org
2021-07-10Fix assign_record_type_typmod().Jeff Davis
If an error occurred in the wrong place, it was possible to leave an unintialized entry in the hash table, leading to a crash. Fixed. Also, be more careful about the order of operations so that an allocation error doesn't leak memory in CacheMemoryContext or unnecessarily advance NextRecordTypmod. Backpatch through version 11. Earlier versions (prior to 35ea75632a5) do not exhibit the problem, because an uninitialized hash entry contains a valid empty list. Author: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Reviewed-by: Andres Freund Discussion: https://postgr.es/m/HE1PR8303MB009069D476225B9A9E194B8891779@HE1PR8303MB0090.EURPRD83.prod.outlook.com Backpatch-through: 11
2021-07-10Fix numeric_mul() overflow due to too many digits after decimal point.Dean Rasheed
This fixes an overflow error when using the numeric * operator if the result has more than 16383 digits after the decimal point by rounding the result. Overflow errors should only occur if the result has too many digits *before* the decimal point. Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
2021-07-09Avoid creating a RESULT RTE that's marked LATERAL.Tom Lane
Commit 7266d0997 added code to pull up simple constant function results, converting the RTE_FUNCTION RTE to a dummy RTE_RESULT RTE since it no longer need be scanned. But I forgot to clear the LATERAL flag if the RTE has it set. If the function reduced to a constant, it surely contains no lateral references so this simplification is logically OK. It's needed because various other places will Assert that RESULT RTEs aren't LATERAL. Per bug #17097 from Yaoguang Chen. Back-patch to v13 where the faulty code came in. Discussion: https://postgr.es/m/17097-3372ef9f798fc94f@postgresql.org
2021-07-09Update configure's probe for libldap to work with OpenLDAP 2.5.Tom Lane
The separate libldap_r is gone and libldap itself is now always thread-safe. Unfortunately there seems no easy way to tell by inspection whether libldap is thread-safe, so we have to take it on faith that libldap is thread-safe if there's no libldap_r. That should be okay, as it appears that libldap_r was a standard part of the installation going back at least 20 years. Report and patch by Adrian Ho. Back-patch to all supported branches, since people might try to build any of them with a newer OpenLDAP. Discussion: https://postgr.es/m/17083-a19190d9591946a7@postgresql.org
2021-07-09Reject cases where a query in WITH rewrites to just NOTIFY.Tom Lane
Since the executor can't cope with a utility statement appearing as a node of a plan tree, we can't support cases where a rewrite rule inserts a NOTIFY into an INSERT/UPDATE/DELETE command appearing in a WITH clause of a larger query. (One can imagine ways around that, but it'd be a new feature not a bug fix, and so far there's been no demand for it.) RewriteQuery checked for this, but it missed the case where the DML command rewrites to *only* a NOTIFY. That'd lead to crashes later on in planning. Add the missed check, and improve the level of testing of this area. Per bug #17094 from Yaoguang Chen. It's been busted since WITH was introduced, so back-patch to all supported branches. Discussion: https://postgr.es/m/17094-bf15dff55eaf2e28@postgresql.org
2021-07-09Remove more obsolete comments about semaphores.Thomas Munro
Commit 6753333f stopped using semaphores as the sleep/wake mechanism for heavyweight locks, but some obsolete references to that scheme remained in comments. As with similar commit 25b93a29, back-patch all the way. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGLafjB1uzXcy%3D%3D2L3cy7rjHkqOVn7qRYGBjk%3D%3DtMJE7Yg%40mail.gmail.com
2021-07-09Add missing Int64GetDatum macro in dbsize.cDavid Rowley
I accidentally missed adding this when adjusting 55fe60938 for back patching. This adjustment was made for 9.6 to 13. 14 and master are not affected. Discussion: https://postgr.es/m/CAApHDvp=twCsGAGQG=A=cqOaj4mpknPBW-EZB-sd+5ZS5gCTtA@mail.gmail.com
2021-07-09Fix incorrect return value in pg_size_pretty(bigint)David Rowley
Due to how pg_size_pretty(bigint) was implemented, it's possible that when given a negative number of bytes that the returning value would not match the equivalent positive return value when given the equivalent positive number of bytes. This was due to two separate issues. 1. The function used bit shifting to convert the number of bytes into larger units. The rounding performed by bit shifting is not the same as dividing. For example -3 >> 1 = -2, but -3 / 2 = -1. These two operations are only equivalent with positive numbers. 2. The half_rounded() macro rounded towards positive infinity. This meant that negative numbers rounded towards zero and positive numbers rounded away from zero. Here we fix #1 by dividing the values instead of bit shifting. We fix #2 by adjusting the half_rounded macro always to round away from zero. Additionally, adjust the pg_size_pretty(numeric) function to be more explicit that it's using division rather than bit shifting. A casual observer might have believed bit shifting was used due to a static function being named numeric_shift_right. However, that function was calculating the divisor from the number of bits and performed division. Here we make that more clear. This change is just cosmetic and does not affect the return value of the numeric version of the function. Here we also add a set of regression tests both versions of pg_size_pretty() which test the values directly before and after the function switches to the next unit. This bug was introduced in 8a1fab36a. Prior to that negative values were always displayed in bytes. Author: Dean Rasheed, David Rowley Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com Backpatch-through: 9.6, where the bug was introduced.
2021-07-05Reduce overhead of cache-clobber testing in LookupOpclassInfo().Tom Lane
Commit 03ffc4d6d added logic to bypass all caching behavior in LookupOpclassInfo when CLOBBER_CACHE_ALWAYS is enabled. It doesn't look like I stopped to think much about what that would cost, but recent investigation shows that the cost is enormous: it roughly doubles the time needed for cache-clobber test runs. There does seem to be value in this behavior when trying to test the opclass-cache loading logic itself, but for other purposes the cost is excessive. Hence, let's back off to doing this only when debug_invalidate_system_caches_always is at least 3; or in older branches, when CLOBBER_CACHE_RECURSIVELY is defined. While here, clean up some other minor issues in LookupOpclassInfo. Re-order the code so we aren't left with broken cache entries (leading to later core dumps) in the unlikely case that we suffer OOM while trying to allocate space for a new entry. (That seems to be my oversight in 03ffc4d6d.) Also, in >= v13, stop allocating one array entry too many. That's evidently left over from sloppy reversion in 851b14b0c. Back-patch to all supported branches, mainly to reduce the runtime of cache-clobbering buildfarm animals. Discussion: https://postgr.es/m/1370856.1625428625@sss.pgh.pa.us
2021-07-02Don't try to print data type names in slot_store_error_callback().Tom Lane
The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
2021-07-01Fix prove_installcheck to use correct paths when used with PGXSAndrew Dunstan
The prove_installcheck recipe in src/Makefile.global.in was emitting bogus paths for a couple of elements when used with PGXS. Here we create a separate recipe for the PGXS case that does it correctly. We also take the opportunity to make the make the file more readable by breaking up the prove_installcheck and prove_check recipes across several lines, and to remove the setting for REGRESS_SHLIB to src/test/recovery/Makefile, which is the only set of tests that actually need it. Backpatch to all live branches Discussion: https://postgr.es/m/f2401388-936b-f4ef-a07c-a0bcc49b3300@dunslane.net
2021-06-30Fix incorrect PITR message for transaction ROLLBACK PREPAREDMichael Paquier
Reaching PITR on such a transaction would cause the generation of a LOG message mentioning a transaction committed, not aborted. Oversight in 4f1b890. Author: Simon Riggs Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com Backpatch-through: 9.6
2021-06-28Don't use abort(3) in libpq's fe-print.c.Tom Lane
Causing a core dump on out-of-memory seems pretty unfriendly, and surely is far outside the expected behavior of a general-purpose library. Just print an error message (as we did already) and return. These functions unfortunately don't have an error return convention, but code using them is probably just looking for a quick-n-dirty print method and wouldn't bother to check anyway. Although these functions are semi-deprecated, it still seems appropriate to back-patch this. In passing, also back-patch b90e6cef1, just to reduce cosmetic differences between the branches. Discussion: https://postgr.es/m/3122443.1624735363@sss.pgh.pa.us
2021-06-28Don't depend on -fwrapv semantics in pgbench's random() function.Tom Lane
Instead use the common/int.h functions to check for integer overflow in a more C-standard-compliant fashion. This is motivated by recent failures on buildfarm member moonjelly, where it appears that development-tip gcc is optimizing without regard to the -fwrapv switch. Presumably that's a gcc bug that will be fixed soon, but we might as well install cleaner coding here rather than wait. (This does not address the question of whether we'll ever be able to get rid of using -fwrapv. Testing shows that this spot is the only place where doing so creates visible regression test failures, but unfortunately that proves very little.) Back-patch to v12. The common/int.h functions exist in v11, but that branch doesn't use them in any client-side code. I judge that this case isn't interesting enough in the real world to take even a small risk of issues from being the first such use. Tom Lane and Fabien Coelho Discussion: https://postgr.es/m/73927.1624815543@sss.pgh.pa.us
2021-06-28Fix race condition in TransactionGroupUpdateXidStatus().Amit Kapila
When we cannot immediately acquire XactSLRULock in exclusive mode at commit time, we add ourselves to a list of processes that need their XIDs status update. We do this if the clog page where we need to update the current transaction status is the same as the group leader's clog page, otherwise, we allow the caller to clear it by itself. Now, when we can't add ourselves to any group, we were not clearing the current proc if it has already become a member of some group which was leading to an assertion failure when the same proc was assigned to another backend after the current backend exits. Reported-by: Alexander Lakhin Bug: 17072 Author: Amit Kapila Tested-By: Alexander Lakhin Backpatch-through: 11, where it was introduced Discussion: https://postgr.es/m/17072-2f8764857ef2c92a@postgresql.org
2021-06-28Add test for CREATE INDEX CONCURRENTLY with not-so-immutable predicateMichael Paquier
83158f7 has improved index_set_state_flags() so as it is possible to use transactional updates when updating pg_index state flags, but there was not really a test case which stressed directly the possibility it fixed. This commit adds such a test, using a predicate that looks valid in appearance but calls a stable function. Author: Andrey Lepikhov Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru Backpatch-through: 9.6
2021-06-28Make index_set_state_flags() transactionalMichael Paquier
3c84046 is the original commit that introduced index_set_state_flags(), where the presence of SnapshotNow made necessary the use of an in-place update. SnapshotNow has been removed in 813fb03, so there is no actual reasons to not make this operation transactional. As reported by Andrey, it is possible to trigger the assertion of this routine expecting no transactional updates when switching the pg_index state flags, using a predicate mark as immutable but calling stable or volatile functions. 83158f7 has been around for a couple of months on HEAD now with no issues found related to it, so it looks safe enough for a backpatch. Reported-by: Andrey Lepikhov Author: Michael Paquier Reviewed-by: Anastasia Lubennikova Discussion: https://postgr.es/m/20200903080440.GA8559@paquier.xyz Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru Backpatch-through: 9.6
2021-06-27Remove memory leaks in isolationtester.Tom Lane
specscanner.l leaked a kilobyte of memory per token of the spec file. Apparently somebody thought that the introductory code block would be executed once; but it's once per yylex() call. A couple of functions in isolationtester.c leaked small amounts of memory due to not bothering to free one-time allocations. Might as well improve these so that valgrind gives this program a clean bill of health. Also get rid of an ugly static variable. Coverity complained about one of the one-time leaks, which led me to try valgrind'ing isolationtester, which led to discovery of the larger leak.
2021-06-26Remove non-existing variable reference in MSVC's Solution.pmMichael Paquier
The version string is grabbed from PACKAGE_VERSION in pg_config.h in the MSVC build since 8f4fb4c6, but an error message referenced a variable that existed before that. This had no consequences except if one messes up enough with the version number of the build. Author: Anton Voloshin Discussion: https://postgr.es/m/af79ee1b-9962-b299-98e1-f90a289e19e6@postgrespro.ru Backpatch-through: 13
2021-06-26Remove some useless logs from the TAP tests of pgbenchMichael Paquier
002_pgbench_no_server was printing some array pointers instead of the actual contents of those arrays for the expected outputs of stdout and stderr for a tested command. This does not add any new information that can help with debugging as the test names allow to track failure locations, if any. This commit simply removes those logs as the rest of the printed information is redundant with command_checks_all(). Per discussion with Andrew Dunstan and Álvaro Herrera. Discussion: https://postgr.es/m/YNXNFaG7IgkzZanD@paquier.xyz Backpatch-through: 11
2021-06-25Remove unnecessary failure cases in RemoveRoleFromObjectPolicy().Tom Lane
It's not really necessary for this function to open or lock the relation associated with the pg_policy entry it's modifying. The error checks it's making on the rel are if anything counterproductive (e.g., if we don't want to allow installation of policies on system catalogs, here is not the place to prevent that). In particular, it seems just wrong to insist on an ownership check. That has the net effect of forcing people to use superuser for DROP OWNED BY, which surely is not an effect we want. Also there is no point in rebuilding the dependencies of the policy expressions, which aren't being changed. Lastly, locking the table also seems counterproductive; it's not helping to prevent race conditions, since we failed to re-read the pg_policy row after acquiring the lock. That means that concurrent DDL would likely result in "tuple concurrently updated/deleted" errors; which is the same behavior this code will produce, with less overhead. Per discussion of bug #17062. Back-patch to all supported versions, as the failure cases this eliminates seem just as undesirable in 9.6 as in HEAD. Discussion: https://postgr.es/m/1573181.1624220108@sss.pgh.pa.us
2021-06-25Make walsenders show their replication commands in pg_stat_activity.Tom Lane
A walsender process that has executed a SQL command left the text of that command in pg_stat_activity.query indefinitely, which is quite confusing if it's in RUNNING state but not doing that query. An easy and useful fix is to treat replication commands as if they were SQL queries, and show them in pg_stat_activity according to the same rules as for regular queries. While we're at it, it seems also sensible to set debug_query_string, allowing error logging and debugging to see the replication command. While here, clean up assorted silliness in exec_replication_command: * Clean up SQLCmd code path, and fix its only-accidentally-not-buggy memory management. * Remove useless duplicate call of SnapBuildClearExportedSnapshot(). * replication_scanner_finish() was never called. Back-patch of commit f560209c6 into v10-v13. I'd originally felt that this didn't merit back-patching, but subsequent confusion while debugging walsender problems suggests that it'll be useful. Also, the original commit has now aged long enough to provide some comfort that it won't cause problems. Discussion: https://postgr.es/m/2673480.1624557299@sss.pgh.pa.us Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
2021-06-25Cleanup some code related to pgbench log checks in TAP testsMichael Paquier
This fixes a couple of problems within the so-said code of this commit subject: - Replace the use of open() with slurp_file(), fixing an issue reported by buildfarm member fairywren whose perl installation keep around CRLF characters, causing the matching patterns for the logs to fail. - Remove the eval block, which is not really necessary. This set of issues has come into light after fixing a different issue with c13585fe, and this is wrong since this code has been introduced. Reported-by: Andrew Dunstan, and buildfarm member fairywren Author: Michael Paquier Reviewed-by: Andrew Dunstan Discussion: https://postgr.es/m/0f49303e-7784-b3ee-200b-cbf67be2eb9e@dunslane.net Backpatch-through: 11
2021-06-25Prepare for forthcoming LLVM 13 API change.Thomas Munro
LLVM 13 (due out in September) has changed the semantics of LLVMOrcAbsoluteSymbols(), so we need to bump some reference counts to avoid a double-free that causes crashes and bad query results. A proactive change seems necessary to avoid having a window of time where our respective latest releases would interact badly. It's possible that the situation could change before then, though. Thanks to Fabien Coelho for monitoring bleeding edge LLVM and Andres Freund for tracking down the change. Back-patch to 11, where the JIT code arrived. Discussion: https://postgr.es/m/CA%2BhUKGLEy8mgtN7BNp0ooFAjUedDTJj5dME7NxLU-m91b85siA%40mail.gmail.com
2021-06-25Fix pattern matching logic for logs in TAP tests of pgbenchMichael Paquier
The logic checking for the format of per-thread logs used grep() with directly "$re", which would cause the test to consider all the logs as a match without caring about their format at all. Using "/$re/" makes grep() perform a regex test, which is what we want here. While on it, improve some of the tests to be more picky with the patterns expected and add more comments to describe the tests. Issue discovered while digging into a separate patch. Author: Fabien Coelho, Michael Paquier Discussion: https://postgr.es/m/YNPsPAUoVDCpPOGk@paquier.xyz Backpatch-through: 11
2021-06-24Fix ABI break introduced by commit 4daa140a2f.Amit Kapila
Move the newly defined enum value REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT at the end to avoid ABI break in the back branches. We need to back-patch this till v11 because before that it is already at the end. Reported-by: Tomas Vondra Backpatch-through: 11 Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
2021-06-24Another fix to relmapper race condition.Heikki Linnakangas
In previous commit, I missed that relmap_redo() was also not acquiring the RelationMappingLock. Thanks to Thomas Munro for pointing that out. Backpatch-through: 9.6, like previous commit. Discussion: https://www.postgresql.org/message-id/CA%2BhUKGLev%3DPpOSaL3WRZgOvgk217et%2BbxeJcRr4eR-NttP1F6Q%40mail.gmail.com
2021-06-24Prevent race condition while reading relmapper file.Heikki Linnakangas
Contrary to the comment here, POSIX does not guarantee atomicity of a read(), if another process calls write() concurrently. Or at least Linux does not. Add locking to load_relmap_file() to avoid the race condition. Fixes bug #17064. Thanks to Alexander Lakhin for the report and test case. Backpatch-through: 9.6, all supported versions. Discussion: https://www.postgresql.org/message-id/17064-bb0d7904ef72add3@postgresql.org
2021-06-23Allow non-quoted identifiers as isolation test session/step names.Tom Lane
For no obvious reason, isolationtester has always insisted that session and step names be written with double quotes. This is fairly tedious and does little for test readability, especially since the names that people actually choose almost always look like normal identifiers. Hence, let's tweak the lexer to allow SQL-like identifiers not only double-quoted strings. (They're SQL-like, not exactly SQL, because I didn't add any case-folding logic. Also there's no provision for U&"..." names, not that anyone's likely to care.) There is one incompatibility introduced by this change: if you write "foo""bar" with no space, that used to be taken as two identifiers, but now it's just one identifier with an embedded quote mark. I converted all the src/test/isolation/ specfiles to remove unnecessary double quotes, but stopped there because my eyes were glazing over already. Like 741d7f104, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
2021-06-23Don't assume GSSAPI result strings are null-terminated.Tom Lane
Our uses of gss_display_status() and gss_display_name() assumed that the gss_buffer_desc strings returned by those functions are null-terminated. It appears that they generally are, given the lack of field complaints up to now. However, the available documentation does not promise this, and some man pages for gss_display_status() show examples that rely on the gss_buffer_desc.length field instead of expecting null termination. Also, we now have a report that on some implementations, clang's address sanitizer is of the opinion that the byte after the specified length is undefined. Hence, change the code to rely on the length field instead. This might well be cosmetic rather than fixing any real bug, but it's hard to be sure, so back-patch to all supported branches. While here, also back-patch the v12 changes that made pg_GSS_error deal honestly with multiple messages available from gss_display_status. Per report from Sudheer H R. Discussion: https://postgr.es/m/5372B6D4-8276-42C0-B8FB-BD0918826FC3@tekenlight.com
2021-06-23Improve display of query results in isolation tests.Tom Lane
Previously, isolationtester displayed SQL query results using some ad-hoc code that clearly hadn't had much effort expended on it. Field values longer than 14 characters weren't separated from the next field, and usually caused misalignment of the columns too. Also there was no visual separation of a query's result from subsequent isolationtester output. This made test result files confusing and hard to read. To improve matters, let's use libpq's PQprint() function. Although that's long since unused by psql, it's still plenty good enough for the purpose here. Like 741d7f104, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
2021-06-22Use annotations to reduce instability of isolation-test results.Tom Lane
We've long contended with isolation test results that aren't entirely stable. Some test scripts insert long delays to try to force stable results, which is not terribly desirable; but other erratic failure modes remain, causing unrepeatable buildfarm failures. I've spent a fair amount of time trying to solve this by improving the server-side support code, without much success: that way is fundamentally unable to cope with diffs that stem from chance ordering of arrival of messages from different server processes. We can improve matters on the client side, however, by annotating the test scripts themselves to show the desired reporting order of events that might occur in different orders. This patch adds three types of annotations to deal with (a) test steps that might or might not complete their waits before the isolationtester can see them waiting; (b) test steps in different sessions that can legitimately complete in either order; and (c) NOTIFY messages that might arrive before or after the completion of a step in another session. We might need more annotation types later, but this seems to be enough to deal with the instabilities we've seen in the buildfarm. It also lets us get rid of all the long delays that were previously used, cutting more than a minute off the runtime of the isolation tests. Back-patch to all supported branches, because the buildfarm instabilities affect all the branches, and because it seems desirable to keep isolationtester's capabilities the same across all branches to simplify possible future back-patching of tests. Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
2021-06-22Restore the portal-level snapshot for simple expressions, too.Tom Lane
Commits 84f5c2908 et al missed the need to cover plpgsql's "simple expression" code path. If the first thing we execute after a COMMIT/ROLLBACK is one of those, rather than a full-fledged SPI command, we must explicitly do EnsurePortalSnapshotExists() to make sure we have an outer snapshot. Note that it wouldn't be good enough to just push a snapshot for the duration of the expression execution: what comes back might be toasted, so we'd better have a snapshot protecting it. The test case demonstrating this fact cheats a bit by marking a SQL function immutable even though it fetches from a table. That's nothing that users haven't been seen to do, though. Per report from Jim Nasby. Back-patch to v11, like the previous fix. Discussion: https://postgr.es/m/378885e4-f85f-fc28-6c91-c4d1c080bf26@amazon.com
2021-06-22Back-patch "Tolerate version lookup failure for old style Windows locale names."Thomas Munro
If users provide old style pre-standardized Windows locale names in a CREATE COLLATION command, the OS is unable to provide version information. Continue without capturing version information, rather than exposing an OS error. This was originally done in commit 9f12a3b9 for 14 only, to support future features that might encounter old style names from initdb's default. It wasn't done in 13 because I didn't consider that users might actually want to use the old format explicitly (something we should consider blocking in a future release with a better error message, but that's not a policy we've decided on yet). Back-patch to 13, based on the field complaint in pgsql-bugs #17058. Reported-by: Yasushi Yamashita <developer@yamashi-ta.jp> Discussion: https://postgr.es/m/17058-b49f5793c912c5aa%40postgresql.org
2021-06-18Fix misbehavior of DROP OWNED BY with duplicate polroles entries.Tom Lane
Ordinarily, a pg_policy.polroles array wouldn't list the same role more than once; but CREATE POLICY does not prevent that. If we perform DROP OWNED BY on a role that is listed more than once, RemoveRoleFromObjectPolicy either suffered an assertion failure or encountered a tuple-updated-by-self error. Rewrite it to cope correctly with duplicate entries, and add a CommandCounterIncrement call to prevent the other problem. Per discussion, there's other cleanup that ought to happen here, but this seems like the minimum essential fix. Per bug #17062 from Alexander Lakhin. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
2021-06-18Avoid scribbling on input node tree in CREATE/ALTER DOMAIN.Tom Lane
This works fine in the "simple Query" code path; but if the statement is in the plan cache then it's corrupted for future re-execution. Apply copyObject() to protect the original tree from modification, as we've done elsewhere. This narrow fix is applied only to the back branches. In HEAD, the problem was fixed more generally by commit 7c337b6b5; but that changed ProcessUtility's API, so it's infeasible to back-patch. Per bug #17053 from Charles Samborski. Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
2021-06-18Don't set a fast default for anything but a plain tableAndrew Dunstan
The fast default code added in Release 11 omitted to check that the table a fast default was being added to was a plain table. Thus one could be added to a foreign table, which predicably blows up. Here we perform that check. In addition, on the back branches, since some of these might have escaped into the wild, if we encounter a missing value for an attribute of something other than a plain table we ignore it. Fixes bug #17056 Backpatch to release 11, Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane