summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2021-08-03C comment: correct heading of extension queryBruce Momjian
Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20210803161345.GZ12533@telsasoft.com Backpatch-through: 9.6
2021-08-03pg_upgrade: warn about extensions that need updatingBruce Momjian
Also create a script that can be run to update them. Reported-by: Dave Cramer Discussion: https://postgr.es/m/CADK3HHKawwbOcGwMGnDuAf3-U8YfvTcS8jqDv3UM=niijs3MMA@mail.gmail.com Backpatch-through: 9.6
2021-07-31Fix corner-case errors and loss of precision in numeric_power().Dean Rasheed
This fixes a couple of related problems that arise when raising numbers to very large powers. Firstly, when raising a negative number to a very large integer power, the result should be well-defined, but the previous code would only cope if the exponent was small enough to go through power_var_int(). Otherwise it would throw an internal error, attempting to take the logarithm of a negative number. Fix this by adding suitable handling to the general case in power_var() to cope with negative bases, checking for integer powers there. Next, when raising a (positive or negative) number whose absolute value is slightly less than 1 to a very large power, the result should approach zero as the power is increased. However, in some cases, for sufficiently large powers, this would lose all precision and return 1 instead of 0. This was due to the way that the local_rscale was being calculated for the final full-precision calculation: local_rscale = rscale + (int) val - ln_dweight + 8 The first two terms on the right hand side are meant to give the number of significant digits required in the result ("val" being the estimated result weight). However, this failed to account for the fact that rscale is clipped to a maximum of NUMERIC_MAX_DISPLAY_SCALE (1000), and the result weight might be less then -1000, causing their sum to be negative, leading to a loss of precision. Fix this by forcing the number of significant digits calculated to be nonnegative. It's OK for it to be zero (when the result weight is less than -1000), since the local_rscale value then includes a few extra digits to ensure an accurate result. Finally, add additional underflow checks to exp_var() and power_var(), so that they consistently return zero for cases like this where the result is indistinguishable from zero. Some paths through this code already returned zero in such cases, but others were throwing overflow errors. Dean Rasheed, reviewed by Yugo Nagata. Discussion: http://postgr.es/m/CAEZATCW6Dvq7+3wN3tt5jLj-FyOcUgT5xNoOqce5=6Su0bCR0w@mail.gmail.com
2021-07-30Fix expect file for MinGW32 ECPG regression testsJohn Naylor
On versions 11 and earlier, MinGW32 has a separate expect file for the regression test changed by master commit 5fcf3945b.
2021-07-30Fix range check in ECPG numeric to int conversionJohn Naylor
The previous coding guarded against -INT_MAX instead of INT_MIN, leading to -2147483648 being rejected as out of range. Per bug #17128 from Kevin Sweet Discussion: https://www.postgresql.org/message-id/flat/17128-55a8a879727a3e3a%40postgresql.org Reviewed-by: Tom Lane Backpatch to all supported branches
2021-07-29Update minimum recovery point on truncation during WAL replay of abort record.Fujii Masao
If a file is truncated, we must update minRecoveryPoint. Once a file is truncated, there's no going back; it would not be safe to stop recovery at a point earlier than that anymore. Commit 7bffc9b7bf changed xact_redo_commit() so that it updates minRecoveryPoint on truncation, but forgot to change xact_redo_abort(). Back-patch to all supported versions. Reported-by: mengjuan.cmj@alibaba-inc.com Author: Fujii Masao Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/b029fce3-4fac-4265-968e-16f36ff4d075.mengjuan.cmj@alibaba-inc.com
2021-07-27Set pg_setting.pending_restart when pertinent config lines are removedAlvaro Herrera
This changes the behavior of examining the pg_file_settings view after changing a config option that requires restart. The user needs to know that any change of such options does not take effect until a restart, and this worked correctly if the line is edited without removing it. However, for the case where the line is removed altogether, the flag doesn't get set, because a flag was only set in set_config_option, but that's not called for lines removed. Repair. (Ref.: commits 62d16c7fc561 and a486e35706ea) Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202107262302.xsfdfc5sb7sh@alvherre.pgsql
2021-07-28Avoid using ambiguous word "non-negative" in error messages.Fujii Masao
The error messages using the word "non-negative" are confusing because it's ambiguous about whether it accepts zero or not. This commit improves those error messages by replacing it with less ambiguous word like "greater than zero" or "greater than or equal to zero". Also this commit added the note about the word "non-negative" to the error message style guide, to help writing the new error messages. When postgres_fdw option fetch_size was set to zero, previously the error message "fetch_size requires a non-negative integer value" was reported. This error message was outright buggy. Therefore back-patch to all supported versions where such buggy error message could be thrown. Reported-by: Hou Zhijie Author: Bharath Rupireddy Reviewed-by: Kyotaro Horiguchi, Fujii Masao Discussion: https://postgr.es/m/OS0PR01MB5716415335A06B489F1B3A8194569@OS0PR01MB5716.jpnprd01.prod.outlook.com
2021-07-26pg_resetxlog: add option to set oldest xid & use by pg_upgradeBruce Momjian
Add pg_resetxlog -u option to set the oldest xid in pg_control. Previously -x set this value be -2 billion less than the -x value. However, this causes the server to immediately scan all relation's relfrozenxid so it can advance pg_control's oldest xid to be inside the autovacuum_freeze_max_age range, which is inefficient and might disrupt diagnostic recovery. pg_upgrade will use this option to better create the new cluster to match the old cluster. Reported-by: Jason Harvey, Floris Van Nee Discussion: https://postgr.es/m/20190615183759.GB239428@rfd.leadboat.com, 87da83168c644fd9aae38f546cc70295@opammb0562.comp.optiver.com Author: Bertrand Drouvot Backpatch-through: 9.6
2021-07-25Make the standby server promptly handle interrupt signals.Fujii Masao
This commit changes the startup process in the standby server so that it handles the interrupt signals after waiting for wal_retrieve_retry_interval on the latch and resetting it, before entering another wait on the latch. This change causes the standby server to promptly handle interrupt signals. Otherwise, previously, there was the case where the standby needs to wait extra five seconds to shutdown when the shutdown request arrived while the startup process was waiting for wal_retrieve_retry_interval on the latch. Author: Fujii Masao, but implementation idea is from Soumyadeep Chakraborty Reviewed-by: Soumyadeep Chakraborty Discussion: https://postgr.es/m/9d7e6ab0-8a53-ddb9-63cd-289bcb25fe0e@oss.nttdata.com Per discussion of BUG #17073, back-patch to all supported versions. Discussion: https://postgr.es/m/17073-1a5fdaed0fa5d4d0@postgresql.org
2021-07-24Fix check for conflicting session- vs transaction-level locks.Tom Lane
We have an implementation restriction that PREPARE TRANSACTION can't handle cases where both session-lifespan and transaction-lifespan locks are held on the same lockable object. (That's because we'd otherwise need to acquire a new PROCLOCK entry during post-prepare cleanup, which is an operation that might fail. The situation can only arise with odd usages of advisory locks, so removing the restriction is probably not worth the amount of effort it would take.) AtPrepare_Locks attempted to enforce this, but its logic was many bricks shy of a load, because it only detected cases where the session and transaction locks had the same lockmode. Locks of different modes on the same object would lead to the rather unhelpful message "PANIC: we seem to have dropped a bit somewhere". To fix, build a transient hashtable with one entry per locktag, not one per locktag + mode, and use that to detect conflicts. Per bug #17122 from Alexander Pyhalov. This bug is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/17122-04f3c32098a62233@postgresql.org
2021-07-24Make printf("%s", NULL) print "(null)" instead of crashing.Tom Lane
We previously took a hard-line attitude that callers should never print a null string pointer, and doing so is worthy of an assertion failure or crash. However, we've long since flushed out any easy-to-find bugs of that nature. What remains is a lot of code that perhaps could fail that way in hard-to-reach corner cases. For example, in something as simple as ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("constraint \"%s\" for table \"%s\" does not exist", conname, get_rel_name(relid)))); one must wonder whether it's completely guaranteed that get_rel_name cannot return NULL in this context. If such a situation did occur, the existing policy converts what might be a pretty minor bug into a server crash condition. This is not good for robustness. Hence, let's follow the lead of glibc and print "(null)" instead of failing. We should, of course, still consider it a bug if that behavior is reachable in ordinary use; but crashing seems less desirable than not crashing. This fix works across-the-board in v12 and up, where we always use src/port/snprintf.c. Before that, on most platforms we're at the mercy of the local libc, but it appears that Solaris 10 is the only supported platform where we'd still get a crash. Most other platforms such as *BSD, macOS, and Solaris 11 have adopted glibc's behavior at some point. (AIX and HPUX just print "" not "(null)", but that's close enough.) I've not checked what Windows' native printf would do, but it doesn't matter because we've long used snprintf.c on that platform. In v12 and up, also const-ify related code so that we're not casting away const on the constant string. This is just neatnik-ism, since next to no compilers will warn about that. Discussion: https://postgr.es/m/17098-b960f3616c861f83@postgresql.org
2021-07-20Fix corner-case uninitialized-variable issues in plpgsql.Tom Lane
If an error was raised during our initial attempt to check whether a successfully-compiled expression is "simple", subsequent calls of exec_stmt_execsql would suppose that stmt->mod_stmt was already computed when it had not been. This could lead to assertion failures in debug builds; in production builds the effect would typically be to act as if INTO STRICT had been specified even when it had not been. Of course that only matters if the subsequent attempt to execute the expression succeeds, so that the problem can only be reached by fixing a failure in some referenced, inline-able SQL function and then retrying the calling plpgsql function in the same session. (There might be even-more-obscure ways to change the expression's behavior without changing the plpgsql function, but that one seems like the only one people would be likely to hit in practice.) The most foolproof way to fix this would be to arrange for exec_prepare_plan to not set expr->plan until we've finished the subsidiary simple-expression check. But it seems hard to do that without creating reference-count leak issues. So settle for documenting the hazard in a comment and fixing exec_stmt_execsql to test separately for whether it's computed stmt->mod_stmt. (That adds a test-and-branch per execution, but hopefully that's negligible in context.) In v11 and up, also fix exec_stmt_call which had a variant of the same issue. Per bug #17113 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17113-077605ce00e0e7ec@postgresql.org
2021-07-13Robustify tuplesort's free_sort_tuple functionDavid Rowley
41469253e went to the trouble of removing a theoretical bug from free_sort_tuple by checking if the tuple was NULL before freeing it. Let's make this a little more robust by also setting the tuple to NULL so that should we be called again we won't end up doing a pfree on the already pfree'd tuple. Per advice from Tom Lane. Discussion: https://postgr.es/m/3188192.1626136953@sss.pgh.pa.us Backpatch-through: 9.6, same as 41469253e
2021-07-13Fix theoretical bug in tuplesortDavid Rowley
This fixes a theoretical bug in tuplesort.c which, if a bounded sort was used in combination with a byval Datum sort (tuplesort_begin_datum), when switching the sort to a bounded heap in make_bounded_heap(), we'd call free_sort_tuple(). The problem was that when sorting Datums of a byval type, the tuple is NULL and free_sort_tuple() would free the memory for it regardless of that. This would result in a crash. Here we fix that simply by adding a check to see if the tuple is NULL before trying to disassociate and free any memory belonging to it. The reason this bug is only theoretical is that nowhere in the current code base do we do tuplesort_set_bound() when performing a Datum sort. However, let's backpatch a fix for this as if any extension uses the code in this way then it's likely to cause problems. Author: Ronan Dunklau Discussion: https://postgr.es/m/CAApHDvpdoqNC5FjDb3KUTSMs5dg6f+XxH4Bg_dVcLi8UYAG3EQ@mail.gmail.com Backpatch-through: 9.6, oldest supported version
2021-07-12Remove dead assignment to local variable.Heikki Linnakangas
This should have been removed in commit 7e30c186da, which split the loop into two. Only the first loop uses the 'from' variable; updating it in the second loop is bogus. It was never read after the first loop, so this was harmless and surely optimized away by the compiler, but let's be tidy. Backpatch to all supported versions. Author: Ranier Vilela Discussion: https://www.postgresql.org/message-id/CAEudQAoWq%2BAL3BnELHu7gms2GN07k-np6yLbukGaxJ1vY-zeiQ%40mail.gmail.com
2021-07-11Lock the extension during ALTER EXTENSION ADD/DROP.Tom Lane
Although we were careful to lock the object being added or dropped, we failed to get any sort of lock on the extension itself. This allowed the ALTER to proceed in parallel with a DROP EXTENSION, which is problematic for a couple of reasons. If both commands succeeded we'd be left with a dangling link in pg_depend, which would cause problems later. Also, if the ALTER failed for some reason, it might try to print the extension's name, and that could result in a crash or (in older branches) a silly error message complaining about extension "(null)". Per bug #17098 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17098-b960f3616c861f83@postgresql.org
2021-07-10Fix numeric_mul() overflow due to too many digits after decimal point.Dean Rasheed
This fixes an overflow error when using the numeric * operator if the result has more than 16383 digits after the decimal point by rounding the result. Overflow errors should only occur if the result has too many digits *before* the decimal point. Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
2021-07-09Update configure's probe for libldap to work with OpenLDAP 2.5.Tom Lane
The separate libldap_r is gone and libldap itself is now always thread-safe. Unfortunately there seems no easy way to tell by inspection whether libldap is thread-safe, so we have to take it on faith that libldap is thread-safe if there's no libldap_r. That should be okay, as it appears that libldap_r was a standard part of the installation going back at least 20 years. Report and patch by Adrian Ho. Back-patch to all supported branches, since people might try to build any of them with a newer OpenLDAP. Discussion: https://postgr.es/m/17083-a19190d9591946a7@postgresql.org
2021-07-09Reject cases where a query in WITH rewrites to just NOTIFY.Tom Lane
Since the executor can't cope with a utility statement appearing as a node of a plan tree, we can't support cases where a rewrite rule inserts a NOTIFY into an INSERT/UPDATE/DELETE command appearing in a WITH clause of a larger query. (One can imagine ways around that, but it'd be a new feature not a bug fix, and so far there's been no demand for it.) RewriteQuery checked for this, but it missed the case where the DML command rewrites to *only* a NOTIFY. That'd lead to crashes later on in planning. Add the missed check, and improve the level of testing of this area. Per bug #17094 from Yaoguang Chen. It's been busted since WITH was introduced, so back-patch to all supported branches. Discussion: https://postgr.es/m/17094-bf15dff55eaf2e28@postgresql.org
2021-07-09Remove more obsolete comments about semaphores.Thomas Munro
Commit 6753333f stopped using semaphores as the sleep/wake mechanism for heavyweight locks, but some obsolete references to that scheme remained in comments. As with similar commit 25b93a29, back-patch all the way. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGLafjB1uzXcy%3D%3D2L3cy7rjHkqOVn7qRYGBjk%3D%3DtMJE7Yg%40mail.gmail.com
2021-07-09Add missing Int64GetDatum macro in dbsize.cDavid Rowley
I accidentally missed adding this when adjusting 55fe60938 for back patching. This adjustment was made for 9.6 to 13. 14 and master are not affected. Discussion: https://postgr.es/m/CAApHDvp=twCsGAGQG=A=cqOaj4mpknPBW-EZB-sd+5ZS5gCTtA@mail.gmail.com
2021-07-09Fix incorrect return value in pg_size_pretty(bigint)David Rowley
Due to how pg_size_pretty(bigint) was implemented, it's possible that when given a negative number of bytes that the returning value would not match the equivalent positive return value when given the equivalent positive number of bytes. This was due to two separate issues. 1. The function used bit shifting to convert the number of bytes into larger units. The rounding performed by bit shifting is not the same as dividing. For example -3 >> 1 = -2, but -3 / 2 = -1. These two operations are only equivalent with positive numbers. 2. The half_rounded() macro rounded towards positive infinity. This meant that negative numbers rounded towards zero and positive numbers rounded away from zero. Here we fix #1 by dividing the values instead of bit shifting. We fix #2 by adjusting the half_rounded macro always to round away from zero. Additionally, adjust the pg_size_pretty(numeric) function to be more explicit that it's using division rather than bit shifting. A casual observer might have believed bit shifting was used due to a static function being named numeric_shift_right. However, that function was calculating the divisor from the number of bits and performed division. Here we make that more clear. This change is just cosmetic and does not affect the return value of the numeric version of the function. Here we also add a set of regression tests both versions of pg_size_pretty() which test the values directly before and after the function switches to the next unit. This bug was introduced in 8a1fab36a. Prior to that negative values were always displayed in bytes. Author: Dean Rasheed, David Rowley Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com Backpatch-through: 9.6, where the bug was introduced.
2021-07-05Reduce overhead of cache-clobber testing in LookupOpclassInfo().Tom Lane
Commit 03ffc4d6d added logic to bypass all caching behavior in LookupOpclassInfo when CLOBBER_CACHE_ALWAYS is enabled. It doesn't look like I stopped to think much about what that would cost, but recent investigation shows that the cost is enormous: it roughly doubles the time needed for cache-clobber test runs. There does seem to be value in this behavior when trying to test the opclass-cache loading logic itself, but for other purposes the cost is excessive. Hence, let's back off to doing this only when debug_invalidate_system_caches_always is at least 3; or in older branches, when CLOBBER_CACHE_RECURSIVELY is defined. While here, clean up some other minor issues in LookupOpclassInfo. Re-order the code so we aren't left with broken cache entries (leading to later core dumps) in the unlikely case that we suffer OOM while trying to allocate space for a new entry. (That seems to be my oversight in 03ffc4d6d.) Also, in >= v13, stop allocating one array entry too many. That's evidently left over from sloppy reversion in 851b14b0c. Back-patch to all supported branches, mainly to reduce the runtime of cache-clobbering buildfarm animals. Discussion: https://postgr.es/m/1370856.1625428625@sss.pgh.pa.us
2021-07-01Fix prove_installcheck to use correct paths when used with PGXSAndrew Dunstan
The prove_installcheck recipe in src/Makefile.global.in was emitting bogus paths for a couple of elements when used with PGXS. Here we create a separate recipe for the PGXS case that does it correctly. We also take the opportunity to make the make the file more readable by breaking up the prove_installcheck and prove_check recipes across several lines, and to remove the setting for REGRESS_SHLIB to src/test/recovery/Makefile, which is the only set of tests that actually need it. Backpatch to all live branches Discussion: https://postgr.es/m/f2401388-936b-f4ef-a07c-a0bcc49b3300@dunslane.net
2021-06-30Fix incorrect PITR message for transaction ROLLBACK PREPAREDMichael Paquier
Reaching PITR on such a transaction would cause the generation of a LOG message mentioning a transaction committed, not aborted. Oversight in 4f1b890. Author: Simon Riggs Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com Backpatch-through: 9.6
2021-06-28Don't use abort(3) in libpq's fe-print.c.Tom Lane
Causing a core dump on out-of-memory seems pretty unfriendly, and surely is far outside the expected behavior of a general-purpose library. Just print an error message (as we did already) and return. These functions unfortunately don't have an error return convention, but code using them is probably just looking for a quick-n-dirty print method and wouldn't bother to check anyway. Although these functions are semi-deprecated, it still seems appropriate to back-patch this. In passing, also back-patch b90e6cef1, just to reduce cosmetic differences between the branches. Discussion: https://postgr.es/m/3122443.1624735363@sss.pgh.pa.us
2021-06-28Add test for CREATE INDEX CONCURRENTLY with not-so-immutable predicateMichael Paquier
83158f7 has improved index_set_state_flags() so as it is possible to use transactional updates when updating pg_index state flags, but there was not really a test case which stressed directly the possibility it fixed. This commit adds such a test, using a predicate that looks valid in appearance but calls a stable function. Author: Andrey Lepikhov Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru Backpatch-through: 9.6
2021-06-28Make index_set_state_flags() transactionalMichael Paquier
3c84046 is the original commit that introduced index_set_state_flags(), where the presence of SnapshotNow made necessary the use of an in-place update. SnapshotNow has been removed in 813fb03, so there is no actual reasons to not make this operation transactional. As reported by Andrey, it is possible to trigger the assertion of this routine expecting no transactional updates when switching the pg_index state flags, using a predicate mark as immutable but calling stable or volatile functions. 83158f7 has been around for a couple of months on HEAD now with no issues found related to it, so it looks safe enough for a backpatch. Reported-by: Andrey Lepikhov Author: Michael Paquier Reviewed-by: Anastasia Lubennikova Discussion: https://postgr.es/m/20200903080440.GA8559@paquier.xyz Discussion: https://postgr.es/m/9b905019-5297-7372-0ad2-e1a4bb66a719@postgrespro.ru Backpatch-through: 9.6
2021-06-27Remove memory leaks in isolationtester.Tom Lane
specscanner.l leaked a kilobyte of memory per token of the spec file. Apparently somebody thought that the introductory code block would be executed once; but it's once per yylex() call. A couple of functions in isolationtester.c leaked small amounts of memory due to not bothering to free one-time allocations. Might as well improve these so that valgrind gives this program a clean bill of health. Also get rid of an ugly static variable. Coverity complained about one of the one-time leaks, which led me to try valgrind'ing isolationtester, which led to discovery of the larger leak.
2021-06-25Remove unnecessary failure cases in RemoveRoleFromObjectPolicy().Tom Lane
It's not really necessary for this function to open or lock the relation associated with the pg_policy entry it's modifying. The error checks it's making on the rel are if anything counterproductive (e.g., if we don't want to allow installation of policies on system catalogs, here is not the place to prevent that). In particular, it seems just wrong to insist on an ownership check. That has the net effect of forcing people to use superuser for DROP OWNED BY, which surely is not an effect we want. Also there is no point in rebuilding the dependencies of the policy expressions, which aren't being changed. Lastly, locking the table also seems counterproductive; it's not helping to prevent race conditions, since we failed to re-read the pg_policy row after acquiring the lock. That means that concurrent DDL would likely result in "tuple concurrently updated/deleted" errors; which is the same behavior this code will produce, with less overhead. Per discussion of bug #17062. Back-patch to all supported versions, as the failure cases this eliminates seem just as undesirable in 9.6 as in HEAD. Discussion: https://postgr.es/m/1573181.1624220108@sss.pgh.pa.us
2021-06-24Stabilize results of insert-conflict-toast.spec.Tom Lane
This back-branch test script was later absorbed into insert-conflict-specconflict.spec, which required some stabilization in commit 741d7f104, so perhaps it's not surprising that it needs a bit of love too. It's odd though that we hadn't seen it fail before now, because I thought that 741d7f104 did not change isolationtester's timing behavior for scripts without any annotation markers. In any case, this script is racy on its face, so add an annotation to force stable reporting order. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2021-06-24%2009%3A54%3A56 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2021-06-24%2010%3A10%3A00
2021-06-24Another fix to relmapper race condition.Heikki Linnakangas
In previous commit, I missed that relmap_redo() was also not acquiring the RelationMappingLock. Thanks to Thomas Munro for pointing that out. Backpatch-through: 9.6, like previous commit. Discussion: https://www.postgresql.org/message-id/CA%2BhUKGLev%3DPpOSaL3WRZgOvgk217et%2BbxeJcRr4eR-NttP1F6Q%40mail.gmail.com
2021-06-24Prevent race condition while reading relmapper file.Heikki Linnakangas
Contrary to the comment here, POSIX does not guarantee atomicity of a read(), if another process calls write() concurrently. Or at least Linux does not. Add locking to load_relmap_file() to avoid the race condition. Fixes bug #17064. Thanks to Alexander Lakhin for the report and test case. Backpatch-through: 9.6, all supported versions. Discussion: https://www.postgresql.org/message-id/17064-bb0d7904ef72add3@postgresql.org
2021-06-23Allow non-quoted identifiers as isolation test session/step names.Tom Lane
For no obvious reason, isolationtester has always insisted that session and step names be written with double quotes. This is fairly tedious and does little for test readability, especially since the names that people actually choose almost always look like normal identifiers. Hence, let's tweak the lexer to allow SQL-like identifiers not only double-quoted strings. (They're SQL-like, not exactly SQL, because I didn't add any case-folding logic. Also there's no provision for U&"..." names, not that anyone's likely to care.) There is one incompatibility introduced by this change: if you write "foo""bar" with no space, that used to be taken as two identifiers, but now it's just one identifier with an embedded quote mark. I converted all the src/test/isolation/ specfiles to remove unnecessary double quotes, but stopped there because my eyes were glazing over already. Like 741d7f104, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
2021-06-23Don't assume GSSAPI result strings are null-terminated.Tom Lane
Our uses of gss_display_status() and gss_display_name() assumed that the gss_buffer_desc strings returned by those functions are null-terminated. It appears that they generally are, given the lack of field complaints up to now. However, the available documentation does not promise this, and some man pages for gss_display_status() show examples that rely on the gss_buffer_desc.length field instead of expecting null termination. Also, we now have a report that on some implementations, clang's address sanitizer is of the opinion that the byte after the specified length is undefined. Hence, change the code to rely on the length field instead. This might well be cosmetic rather than fixing any real bug, but it's hard to be sure, so back-patch to all supported branches. While here, also back-patch the v12 changes that made pg_GSS_error deal honestly with multiple messages available from gss_display_status. Per report from Sudheer H R. Discussion: https://postgr.es/m/5372B6D4-8276-42C0-B8FB-BD0918826FC3@tekenlight.com
2021-06-23Improve display of query results in isolation tests.Tom Lane
Previously, isolationtester displayed SQL query results using some ad-hoc code that clearly hadn't had much effort expended on it. Field values longer than 14 characters weren't separated from the next field, and usually caused misalignment of the columns too. Also there was no visual separation of a query's result from subsequent isolationtester output. This made test result files confusing and hard to read. To improve matters, let's use libpq's PQprint() function. Although that's long since unused by psql, it's still plenty good enough for the purpose here. Like 741d7f104, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
2021-06-22Use annotations to reduce instability of isolation-test results.Tom Lane
We've long contended with isolation test results that aren't entirely stable. Some test scripts insert long delays to try to force stable results, which is not terribly desirable; but other erratic failure modes remain, causing unrepeatable buildfarm failures. I've spent a fair amount of time trying to solve this by improving the server-side support code, without much success: that way is fundamentally unable to cope with diffs that stem from chance ordering of arrival of messages from different server processes. We can improve matters on the client side, however, by annotating the test scripts themselves to show the desired reporting order of events that might occur in different orders. This patch adds three types of annotations to deal with (a) test steps that might or might not complete their waits before the isolationtester can see them waiting; (b) test steps in different sessions that can legitimately complete in either order; and (c) NOTIFY messages that might arrive before or after the completion of a step in another session. We might need more annotation types later, but this seems to be enough to deal with the instabilities we've seen in the buildfarm. It also lets us get rid of all the long delays that were previously used, cutting more than a minute off the runtime of the isolation tests. Back-patch to all supported branches, because the buildfarm instabilities affect all the branches, and because it seems desirable to keep isolationtester's capabilities the same across all branches to simplify possible future back-patching of tests. Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
2021-06-18Fix misbehavior of DROP OWNED BY with duplicate polroles entries.Tom Lane
Ordinarily, a pg_policy.polroles array wouldn't list the same role more than once; but CREATE POLICY does not prevent that. If we perform DROP OWNED BY on a role that is listed more than once, RemoveRoleFromObjectPolicy either suffered an assertion failure or encountered a tuple-updated-by-self error. Rewrite it to cope correctly with duplicate entries, and add a CommandCounterIncrement call to prevent the other problem. Per discussion, there's other cleanup that ought to happen here, but this seems like the minimum essential fix. Per bug #17062 from Alexander Lakhin. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
2021-06-18Avoid scribbling on input node tree in CREATE/ALTER DOMAIN.Tom Lane
This works fine in the "simple Query" code path; but if the statement is in the plan cache then it's corrupted for future re-execution. Apply copyObject() to protect the original tree from modification, as we've done elsewhere. This narrow fix is applied only to the back branches. In HEAD, the problem was fixed more generally by commit 7c337b6b5; but that changed ProcessUtility's API, so it's infeasible to back-patch. Per bug #17053 from Charles Samborski. Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
2021-06-18Update plpython_subtransaction alternative expected filesPeter Eisentraut
The original patch only targeted Python 2.6 and newer, since that is what we have supported in PostgreSQL 13 and newer. For older branches, we need to fix it up for older Python versions.
2021-06-17Tidy up GetMultiXactIdMembers()'s behavior on errorHeikki Linnakangas
One of the error paths left *members uninitialized. That's not a live bug, because most callers don't look at *members when the function returns -1, but let's be tidy. One caller, in heap_lock_tuple(), does "if (members != NULL) pfree(members)", but AFAICS it never passes an invalid 'multi' value so it should not reach that error case. The callers are also a bit inconsistent in their expectations. heap_lock_tuple() pfrees the 'members' array if it's not-NULL, others pfree() it if "nmembers >= 0", and others if "nmembers > 0". That's not a live bug either, because the function should never return 0, but add an Assert for that to make it more clear. I left the callers alone for now. I also moved the line where we set *nmembers. It wasn't wrong before, but I like to do that right next to the 'return' statement, to make it clear that it's always set on return. Also remove one unreachable return statement after ereport(ERROR), for brevity and for consistency with the similar if-block right after it. Author: Greg Nancarrow with the additional changes by me Backpatch-through: 9.6, all supported versions
2021-06-17Fix subtransaction test for Python 3.10Peter Eisentraut
Starting with Python 3.10, the stacktrace looks differently: - PL/Python function "subtransaction_exit_subtransaction_in_with", line 3, in <module> - s.__exit__(None, None, None) + PL/Python function "subtransaction_exit_subtransaction_in_with", line 2, in <module> + with plpy.subtransaction() as s: Using try/except specifically makes the error look always the same. (See https://github.com/python/cpython/pull/25719 for the discussion of this change in Python.) Author: Honza Horak <hhorak@redhat.com> Discussion: https://www.postgresql.org/message-id/flat/853083.1620749597%40sss.pgh.pa.us RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1959080
2021-06-17Detect unused steps in isolation specs and do some cleanupMichael Paquier
This is useful for developers to find out if an isolation spec is over-engineered or if it needs more work by warning at the end of a test run if a step is not used, generating a failure with extra diffs. While on it, clean up all the specs which include steps not used in any permutations to simplify them. This is a backpatch of 989d23b and 06fdc4e, as it is becoming useful to make all the branches consistent for an upcoming patch that will improve the output generated by isolationtester. Author: Michael Paquier Reviewed-by: Asim Praveen, Melanie Plageman Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us Backpatch-through: 9.6
2021-06-17Remove dry-run mode from isolationtesterMichael Paquier
The original purpose of the dry-run mode is to be able to print all the possible permutations from a spec file, but it has become less useful since isolation tests have improved regarding deadlock detection as one step not wanted by the author could block indefinitely now (originally the step blocked would have been detected rather quickly). Per discussion, let's remove it. This is a backpatch of 9903338 for 9.6~12. It is proving to become useful to have on those branches so as the code gets consistent across all supported versions, as a matter of improving the output generated by isolationtester. Author: Michael Paquier Reviewed-by: Asim Praveen, Melanie Plageman Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us Backpatch-through: 9.6
2021-06-16Fix plancache refcount leak after error in ExecuteQuery.Tom Lane
When stuffing a plan from the plancache into a Portal, one is not supposed to risk throwing an error between GetCachedPlan and PortalDefineQuery; if that happens, the plan refcount incremented by GetCachedPlan will be leaked. I managed to break this rule while refactoring code in 9dbf2b7d7. There is no visible consequence other than some memory leakage, and since nobody is very likely to trigger the relevant error conditions many times in a row, it's not surprising we haven't noticed. Nonetheless, it's a bug, so rearrange the order of operations to remove the hazard. Noted on the way to looking for a better fix for bug #17053. This mistake is pretty old, so back-patch to all supported branches.
2021-06-15Further refinement of stuck_on_old_timeline recovery testAndrew Dunstan
TestLib::perl2host can take a file argument as well as a directory argument, so that code becomes substantially simpler. Also add comments on why we're using forward slashes, and why we're setting PERL_BADLANG=0. Discussion: https://postgr.es/m/e9947bcd-20ee-027c-f0fe-01f736b7e345@dunslane.net
2021-06-15Fix decoding of speculative aborts.Amit Kapila
During decoding for speculative inserts, we were relying for cleaning toast hash on confirmation records or next change records. But that could lead to multiple problems (a) memory leak if there is neither a confirmation record nor any other record after toast insertion for a speculative insert in the transaction, (b) error and assertion failures if the next operation is not an insert/update on the same table. The fix is to start queuing spec abort change and clean up toast hash and change record during its processing. Currently, we are queuing the spec aborts for both toast and main table even though we perform cleanup while processing the main table's spec abort record. Later, if we have a way to distinguish between the spec abort record of toast and the main table, we can avoid queuing the change for spec aborts of toast tables. Reported-by: Ashutosh Bapat Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
2021-06-13Work around portability issue with newer versions of mktime().Tom Lane
Recent glibc versions have made mktime() fail if tm_isdst is inconsistent with the prevailing timezone; in particular it fails for tm_isdst = 1 when the zone is UTC. (This seems wildly inconsistent with the POSIX-mandated treatment of "incorrect" values for the other fields of struct tm, so if you ask me it's a bug, but I bet they'll say it's intentional.) This has been observed to cause cosmetic problems when pg_restore'ing an archive created in a different timezone. To fix, do mktime() using the field values from the archive, and if that fails try again with tm_isdst = -1. This will give a result that's off by the UTC-offset difference from the original zone, but that was true before, too. It's not terribly critical since we don't do anything with the result except possibly print it. (Someday we should flush this entire bit of logic and record a standard-format timestamp in the archive instead. That's not okay for a back-patched bug fix, though.) Also, guard our only other use of mktime() by having initdb's build_time_t() set tm_isdst = -1 not 0. This case could only have an issue in zones that are DST year-round; but I think some do exist, or could in future. Per report from Wells Oliver. Back-patch to all supported versions, since any of them might need to run with a newer glibc. Discussion: https://postgr.es/m/CAOC+FBWDhDHO7G-i1_n_hjRzCnUeFO+H-Czi1y10mFhRWpBrew@mail.gmail.com
2021-06-13Further tweaks to stuck_on_old_timeline recovery testAndrew Dunstan
Translate path slashes on target directory path. This was confusing old branches, but is applied to all branches for the sake of uniformity. Perl is perfectly able to understand paths with forward slashes. Along the way, restore the previous archive_wait query, for the sake of uniformity with other tests, per gripe from Tom Lane.