summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2024-08-29Disallow USING clause when altering type of generated columnPeter Eisentraut
This does not make sense. It would write the output of the USING clause into the converted column, which would violate the generation expression. This adds a check to error out if this is specified. There was a test for this, but that test errored out for a different reason, so it was not effective. Reported-by: Jian He <jian.universality@gmail.com> Reviewed-by: Yugo NAGATA <nagata@sraoss.co.jp> Discussion: https://www.postgresql.org/message-id/flat/c7083982-69f4-4b14-8315-f9ddb20b9834%40eisentraut.org
2024-08-19Avoid failure to open dropped detached partitionAlvaro Herrera
When a partition is detached and immediately dropped, a prepared statement could try to compute a new partition descriptor that includes it. This leads to this kind of error: ERROR: could not open relation with OID 457639 Avoid this by skipping the partition in expand_partitioned_rtentry if it doesn't exist. Noted by me while investigating bug #18559. Kuntal Gosh helped to identify the exact failure. Backpatch to 14, where DETACH CONCURRENTLY was introduced. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/202408122233.bo4adt3vh5bi@alvherre.pgsql
2024-08-19Explain dropdb can't use syscache because of TOASTTomas Vondra
Add a comment explaining dropdb() can't rely on syscache. The issue with flattened rows was fixed by commit 0f92b230f88b, but better to have a clear explanation why the systable scan is necessary. The other places doing in-place updates on pg_database have the same comment. Suggestion and patch by Yugo Nagata. Backpatch to 12, same as the fix. Author: Yugo Nagata Backpatch-through: 12 Discussion: https://postgr.es/m/CAJTYsWWNkCt+-UnMhg=BiCD3Mh8c2JdHLofPxsW3m2dkDFw8RA@mail.gmail.com
2024-08-19Fix regression in TLS session ticket disablingDaniel Gustafsson
Commit 274bbced disabled session tickets for TLSv1.3 on top of the already disabled TLSv1.2 session tickets, but accidentally caused a regression where TLSv1.2 session tickets were incorrectly sent. Fix by unconditionally disabling TLSv1.2 session tickets and only disable TLSv1.3 tickets when the right version of OpenSSL is used. Backpatch to all supported branches. Reported-by: Cameron Vogt <cvogt@automaticcontrols.net> Reported-by: Fire Emerald <fire.github@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/DM6PR16MB3145CF62857226F350C710D1AB852@DM6PR16MB3145.namprd16.prod.outlook.com Backpatch-through: v12
2024-08-19Fix harmless LC_COLLATE[_MASK] confusion.Thomas Munro
Commit ca051d8b101 called newlocale(LC_COLLATE, ...) instead of newlocale(LC_COLLATE_MASK, ...), in code reached only on FreeBSD. They have the same value on that OS, explaining why it worked. Fix. Back-patch to 14, where ca051d8b101 landed.
2024-08-19ci: Upgrade MacPorts version to 2.10.1.Thomas Munro
MacPorts version 2.9.3 started failing in our ci_macports_packages.sh script, for reasons not fully determined, but plausibly linked to the release of 2.10.1. 2.10.1 seems to work, so let's switch to it. Back-patch to 15, where CI began. Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/81f104e8-f0a9-43c0-85bd-2bbbf590a5b8%40eisentraut.org
2024-08-19Fix DROP DATABASE for databases with many ACLsTomas Vondra
Commit c66a7d75e652 modified DROP DATABASE so that if interrupted, the database is known to be in an invalid state and can only be dropped. This is done by setting a flag using an in-place update, so that it's not lost in case of rollback. For databases with many ACLs, this may however fail like this: ERROR: wrong tuple length This happens because with many ACLs, the pg_database.datacl attribute gets TOASTed. The dropdb() code reads the tuple from the syscache, which means it's detoasted. But the in-place update expects the tuple length to match the on-disk tuple. Fixed by reading the tuple from the catalog directly, not from syscache. Report and fix by Ayush Tiwari. Backpatch to 12. The DROP DATABASE fix was backpatched to 11, but 11 is EOL at this point. Reported-by: Ayush Tiwari Author: Ayush Tiwari Reviewed-by: Tomas Vondra Backpatch-through: 12 Discussion: https://postgr.es/m/CAJTYsWWNkCt+-UnMhg=BiCD3Mh8c2JdHLofPxsW3m2dkDFw8RA@mail.gmail.com
2024-08-12Fix creation of partition descriptor during concurrent detach+dropAlvaro Herrera
If a partition undergoes DETACH CONCURRENTLY immediately followed by DROP, this could cause a problem for a concurrent transaction recomputing the partition descriptor when running a prepared statement, because it tries to dereference a pointer to a tuple that's not found in a catalog scan. The existing retry logic added in commit dbca3469ebf8 is sufficient to cope with the overall problem, provided we don't try to dereference a non-existant heap tuple. Arguably, the code in RelationBuildPartitionDesc() has been wrong all along, since no check was added in commit 898e5e3290a7 against receiving a NULL tuple from the catalog scan; that bug has only become user-visible with DETACH CONCURRENTLY which was added in branch 14. Therefore, even though there's no known mechanism to cause a crash because of this, backpatch the addition of such a check to all supported branches. In branches prior to 14, this would cause the code to fail with a "missing relpartbound for relation XYZ" error instead of crashing; that's okay, because there are no reports of such behavior anyway. Author: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18559-b48286d2eacd9a4e@postgresql.org
2024-08-11Suppress Coverity warnings about Asserts in get_name_for_var_field.Tom Lane
Coverity thinks dpns->plan could be null at these points. That shouldn't really be possible, but it's easy enough to modify the Asserts so they'd not core-dump if it were true. These are new in b919a97a6. Back-patch to v13; the v12 version of the patch didn't have these Asserts.
2024-08-10Allow adjusting session_authorization and role in parallel workers.Tom Lane
The code intends to allow GUCs to be set within parallel workers via function SET clauses, but not otherwise. However, doing so fails for "session_authorization" and "role", because the assign hooks for those attempt to set the subsidiary "is_superuser" GUC, and that call falls foul of the "not otherwise" prohibition. We can't switch to using GUC_ACTION_SAVE for this, so instead add a new GUC variable flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set anyway. (This is okay because is_superuser has context PGC_INTERNAL and thus only hard-wired calls can change it. We'd need more thought before applying the flag to other GUCs; but maybe there are other use-cases.) This isn't the prettiest fix perhaps, but other alternatives we thought of would be much more invasive. While here, correct a thinko in commit 059de3ca4: when rejecting a GUC setting within a parallel worker, we should return 0 not -1 if the ereport doesn't longjmp. (This seems to have no consequences right now because no caller cares, but it's inconsistent.) Improve the comments to try to forestall future confusion of the same kind. Despite the lack of field complaints, this seems worth back-patching. Thanks to Nathan Bossart for the idea to invent a new flag, and for review. Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us
2024-08-09Fix "failed to find plan for subquery/CTE" errors in EXPLAIN.Tom Lane
To deparse a reference to a field of a RECORD-type output of a subquery, EXPLAIN normally digs down into the subquery's plan to try to discover exactly which anonymous RECORD type is meant. However, this can fail if the subquery has been optimized out of the plan altogether on the grounds that no rows could pass the WHERE quals, which has been possible at least since 3fc6e2d7f. There isn't anything remaining in the plan tree that would help us, so fall back to printing the field name as "fN" for the N'th column of the record. (This will actually be the right thing some of the time, since it matches the column names we assign to RowExprs.) In passing, fix a comment typo in create_projection_plan, which I noticed while experimenting with an alternative fix for this. Per bug #18576 from Vasya B. Back-patch to all supported branches. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18576-9feac34e132fea9e@postgresql.org
2024-08-08Refuse ATTACH of a table referenced by a foreign keyAlvaro Herrera
Trying to attach a table as a partition which is already on the referenced side of a foreign key on the partitioned table that it is being attached to, leads to strange behavior: we try to clone the foreign key from the parent to the partition, but this new FK points to the partition itself, and the mix of pg_constraint rows and triggers doesn't behave well. Rather than trying to untangle the mess (which might be possible given sufficient time), I opted to forbid the ATTACH. This doesn't seem a problematic restriction, given that we already fail to create the foreign key if you do it the other way around, that is, having the partition first and the FK second. Backpatch to all supported branches. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org
2024-08-08Fix pg_rewind debug output to print the source timeline historyHeikki Linnakangas
getTimelineHistory() is called twice, to read the source and the target timeline history files. However, the loop to print the file with the --debug option used the wrong variable when dealing with the source. As a result, the source's history was always printed as empty. Spotted while debugging bug #18575, but this does not fix that bug, just the debugging output. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/092dd515-b7b4-4fd0-8407-ceca2f02f6ec@iki.fi
2024-08-08Revert ECPG's use of pnstrdup()Peter Eisentraut
Commit 0b9466fce added a dependency on fe_memutils' pnstrdup() inside informix.c. This adds an exit() path in a library, which we don't want. (Unlike libpq, the ecpg libraries don't have an automated check for that, but it makes sense to keep them to a similar standard.) The ecpg code can already handle failure results from the *strdup() call by itself. Author: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
2024-08-07Fix edge case in plpgsql's make_callstmt_target().Tom Lane
If the plancache entry for the CALL statement is already stale, it's possible for us to fetch an old procedure OID out of it, and then fail with "cache lookup failed for function NNN". In ordinary usage this never happens because make_callstmt_target is called just once immediately after building the plancache entry. It can be forced however by setting up an erroneous CALL (that causes make_callstmt_target itself to report an error), then dropping/recreating the target procedure, then repeating the erroneous CALL. To fix, use SPI_plan_get_cached_plan() to fetch the plancache's plan, rather than assuming we can use SPI_plan_get_plan_sources(). This shouldn't add any noticeable overhead in the normal case, and in the stale-plan case we'd have had to replan anyway a little further down. The other callers of SPI_plan_get_plan_sources() seem OK, because either they don't need up-to-date plans or they know that the query was just (re) planned. But add some commentary in hopes of not falling into this trap again. Per bug #18574 from Song Hongyu. Back-patch to v14 where this coding was introduced. (Older branches have comparable code, but it's run after any required replanning, so there's no issue.) Discussion: https://postgr.es/m/18574-2ce7ba3249221389@postgresql.org
2024-08-07Make fallback MD5 implementation thread-safe on big-endian systemsHeikki Linnakangas
Replace a static scratch buffer with a local variable, because a static buffer makes the function not thread-safe. This function is used in client-code in libpq, so it needs to be thread-safe. It was until commit b67b57a966, which replaced the implementation with the one from pgcrypto. Backpatch to v14, where we switched to the new implementation. Reviewed-by: Robert Haas, Michael Paquier Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
2024-08-05Restrict accesses to non-system views and foreign tables during pg_dump.Masahiko Sawada
When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12
2024-08-05Translation updatesPeter Eisentraut
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 09a5036e19350293c332e686d66c636762f7a454
2024-08-01Update comment in portal.h.Etsuro Fujita
We store tuples into the portal's tuple store for a PORTAL_ONE_MOD_WITH query as well. Back-patch to all supported branches. Reviewed by Andy Fan. Discussion: https://postgr.es/m/CAPmGK14HVYBZYZtHabjeCd-e31VT%3Dwx6rQNq8QfehywLcpZ2Hw%40mail.gmail.com
2024-07-31Revert "Allow parallel workers to cope with a newly-created session user ID."Tom Lane
This reverts commit 48536305370acd75c6264fcefec7fae7af8c5440. Some buildfarm animals are failing with "cannot change "client_encoding" during a parallel operation". It looks like assign_client_encoding is unhappy at being asked to roll back a client_encoding setting after a parallel worker encounters a failure. There must be more to it though: why didn't I see this during local testing? In any case, it's clear that moving the RestoreGUCState() call is not as side-effect-free as I thought. Given that the bug f5f30c22e intended to fix has gone unreported for years, it's not something that's urgent to fix; I'm not willing to risk messing with it further with only days to our next release wrap.
2024-07-31Allow parallel workers to cope with a newly-created session user ID.Tom Lane
Parallel workers failed after a sequence like BEGIN; CREATE USER foo; SET SESSION AUTHORIZATION foo; because check_session_authorization could not see the uncommitted pg_authid row for "foo". This is because we ran RestoreGUCState() in a separate transaction using an ordinary just-created snapshot. The same disease afflicts any other GUC that requires catalog lookups and isn't forgiving about the lookups failing. To fix, postpone RestoreGUCState() into the worker's main transaction after we've set up a snapshot duplicating the leader's. This affects check_transaction_isolation and check_transaction_deferrable, which think they should only run during transaction start. Make them act like check_transaction_read_only, which already knows it should silently accept the value when InitializingParallelWorker. Per bug #18545 from Andrey Rachitskiy. Back-patch to all supported branches, because this has been wrong for awhile. Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
2024-07-29Use DELETE instead of UPDATE to speed up vacuum testMelanie Plageman
e2e992820f104def introduced a test which generated dead tuples for vacuum with an UPDATE. The test only required enough dead TIDs for two rounds of index vacuuming. This can be accomplished with a DELETE instead of an UPDATE -- which generates about 50% less WAL and makes the test 20% faster in many cases. The test takes several seconds (more on slow buildfarm animals) because we need quite a few tuples to trigger two rounds of index vacuuming; so it is worth a follow-on commit to speed it up. Suggested-by: Masahiko Sawada Discussion: https://postgr.es/m/CAAKRu_bWmMjmqL%2BOZ2duEQ80u7cRvpsExLNZNjzk-pXX5skwMQ%40mail.gmail.com Backpatch-through: 14, the first version containing this test.
2024-07-28Fix incorrect return value for pg_size_pretty(bigint)David Rowley
pg_size_pretty(bigint) would return the value in bytes rather than PB for the smallest-most bigint value. This happened due to an incorrect assumption that the absolute value of -9223372036854775808 could be stored inside a signed 64-bit type. Here we fix that by instead storing that value in an unsigned 64-bit type. This bug does exist in versions prior to 15 but the code there is sufficiently different and the bug seems sufficiently non-critical that it does not seem worth risking backpatching further. Author: Joseph Koshakow <koshy44@gmail.com> Discussion: https://postgr.es/m/CAAvxfHdTsMZPWEHUrZ=h3cky9Ccc3Mtx2whUHygY+ABP-mCmUw@mail.gmail.com Backpatch-through: 15
2024-07-28libpq: Use strerror_r instead of strerrorPeter Eisentraut
Commit 453c4687377 introduced a use of strerror() into libpq, but that is not thread-safe. Fix by using strerror_r() instead. In passing, update some of the code comments added by 453c4687377, as we have learned more about the reason for the change in OpenSSL that started this. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: Discussion: https://postgr.es/m/b6fb018b-f05c-4afd-abd3-318c649faf18@highgo.ca
2024-07-26Fix building with MSVC for TLS session disablingDaniel Gustafsson
Commit 274bbced85 omitted the required changes for the MSVC build system in v16 through v12. Per buildfarm animal hamerkop. Discussion: https://postgr.es/m/7919238F-723C-4113-9742-EBCE7A76A6B4@yesql.se
2024-07-26Fix macro placement in pg_config.h.inDaniel Gustafsson
Commit 274bbced85383e831dde accidentally placed the pg_config.h.in for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader places it. Fix by re-arranging and backpatch to the same level as the original commit. Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru> Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru Backpatch-through: v12
2024-07-26Disable all TLS session ticketsDaniel Gustafsson
OpenSSL supports two types of session tickets for TLSv1.3, stateless and stateful. The option we've used only turns off stateless tickets leaving stateful tickets active. Use the new API introduced in 1.1.1 to disable all types of tickets. Backpatch to all supported versions. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de Backpatch-through: v12
2024-07-25ci: Pin MacPorts version to 2.9.3.Thomas Munro
Commit d01ce180 invented a new way to find the latest MacPorts version. By bad luck, a new beta release has just been published, and it seems to lack some packages we need. Go back to searching for this specific version for now. We still search with a pattern so that we can find the package for the running version of macOS, but for now we always look for 2.9.3. The code to do that had been anticipated already in a commented out line, I just didn't expect to have to use it so soon... Also include the whole MacPorts installation script in the cache key, so that changes to the script cause a fresh installation. This should make it a bit easier to reason about the effect of changes on cached state in github accounts using CI, when we make adjustments. Back-patch to 15, like d01ce180. Discussion: https://postgr.es/m/CA%2BhUKGLqJdv6RcwyZ_0H7khxtLTNJyuK%2BvDFzv3uwYbn8hKH6A%40mail.gmail.com
2024-07-25ci: Upgrade macOS version from 13 to 14.Thomas Munro
1. Previously we were using ghcr.io/cirruslabs/macos-XXX-base:latest images, but Cirrus has started ignoring that and using a particular image, currently ghcr.io/cirruslabs/macos-runner:sonoma, for github accounts using free CI resources (as opposed to dedicated runner machines, as cfbot uses). Let's just ask for that image anyway, to stay in sync. 2. Instead of hard-coding a MacPorts installation URL, deduce it from the running macOS version and the available releases. This removes the need to keep the ci_macports_packages.sh in sync with .cirrus.task.yml, and to advance the MacPorts version from time to time. 3. Change the cache key we use to cache the whole macports installation across builds to include the OS major version, to trigger a fresh installation when appropriate. Back-patch to 15 where CI began. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGLqJdv6RcwyZ_0H7khxtLTNJyuK%2BvDFzv3uwYbn8hKH6A%40mail.gmail.com
2024-07-24Reset relhassubclass upon attaching table as a partitionAlvaro Herrera
We don't allow inheritance parents as partitions, and have checks to prevent this; but if a table _was_ in the past an inheritance parents and all their children are removed, the pg_class.relhassubclass flag may remain set, which confuses the partition pruning code (most obviously, it results in an assertion failure; in production builds it may be worse.) Fix by resetting relhassubclass on attach. Backpatch to all supported versions. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18550-d5e047e9a897a889@postgresql.org
2024-07-23Detect integer overflow in array_set_slice().Nathan Bossart
When provided an empty initial array, array_set_slice() fails to check for overflow when computing the new array's dimensions. While such overflows are ordinarily caught by ArrayGetNItems(), commands with the following form are accepted: INSERT INTO t (i[-2147483648:2147483647]) VALUES ('{}'); To fix, perform the hazardous computations using overflow-detecting arithmetic routines. As with commit 18b585155a, the added test cases generate errors that include a platform-dependent value, so we again use psql's VERBOSITY parameter to suppress printing the message text. Reported-by: Alexander Lakhin Author: Joseph Koshakow Reviewed-by: Jian He Discussion: https://postgr.es/m/31ad2cd1-db94-bdb3-f91a-65ffdb4bef95%40gmail.com Backpatch-through: 12
2024-07-22Doc: improve description of plpgsql's FETCH and MOVE commands.Tom Lane
We were not being clear about which variants of the "direction" clause are permitted in MOVE. Also, the text seemed to be written with only the FETCH/MOVE NEXT case in mind, so it didn't apply very well to other variants. Also, document that "MOVE count IN cursor" only works if count is a constant. This is not the whole truth, because some other cases such as a parenthesized expression will also work, but we want to push people to use "MOVE FORWARD count" instead. The constant case is enough to cover what we allow in plain SQL, and that seems sufficient to claim support for. Update a comment in pl_gram.y claiming that we don't document that point. Per gripe from Philipp Salvisberg. Discussion: https://postgr.es/m/172155553388.702.7932496598218792085@wrigleys.postgresql.org
2024-07-20Correctly check updatability of columns targeted by INSERT...DEFAULT.Tom Lane
If a view has some updatable and some non-updatable columns, we failed to verify updatability of any columns for which an INSERT or UPDATE on the view explicitly specifies a DEFAULT item (unless the view has a declared default for that column, which is rare anyway, and one would almost certainly not write one for a non-updatable column). This would lead to an unexpected "attribute number N not found in view targetlist" error rather than the intended error. Per bug #18546 from Alexander Lakhin. This bug is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/18546-84a292e759a9361d@postgresql.org
2024-07-19Add overflow checks to money type.Nathan Bossart
None of the arithmetic functions for the the money type handle overflow. This commit introduces several helper functions with overflow checking and makes use of them in the money type's arithmetic functions. Fixes bug #18240. Reported-by: Alexander Lakhin Author: Joseph Koshakow Discussion: https://postgr.es/m/18240-c5da758d7dc1ecf0%40postgresql.org Discussion: https://postgr.es/m/CAAvxfHdBPOyEGS7s%2Bxf4iaW0-cgiq25jpYdWBqQqvLtLe_t6tw%40mail.gmail.com Backpatch-through: 12
2024-07-19Test that vacuum removes tuples older than OldestXminMelanie Plageman
If vacuum fails to prune a tuple killed before OldestXmin, it will later find that tuple dead in lazy_scan_prune() and loop infinitely. Add a test reproducing this scenario to the recovery suite which creates a table on a primary, updates the table to generate dead tuples for vacuum, and then, during the vacuum, uses a replica to force GlobalVisState->maybe_needed on the primary to move backwards and precede the value of OldestXmin set at the beginning of vacuuming the table. This commit is separate from the fix in case there are test stability issues. Discussion of the bug: https://postgr.es/m/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com Discussion of the test: https://postgr.es/m/CAAKRu_apNU2MPBK96V%2BbXjTq0RiZ-%3DA4ZTaysakpx9jxbq1dbQ%40mail.gmail.com Author: Melanie Plageman Reviewed-by: Peter Geoghegan
2024-07-19Ensure vacuum removes all visibly dead tuples older than OldestXminMelanie Plageman
If vacuum fails to remove a tuple with xmax older than VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed, it will loop infinitely in lazy_scan_prune(), which compares tuples' visibility information to OldestXmin. Starting in version 14, which uses GlobalVisState for visibility testing during pruning, it is possible for GlobalVisState->maybe_needed to precede OldestXmin if maybe_needed is forced to go backward while vacuum is running. This can happen if a disconnected standby with a running transaction older than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum initially calculates GlobalVisState and OldestXmin. Fix this by having vacuum always remove tuples older than OldestXmin during pruning. This is okay because the standby won't replay the tuple removal until the tuple is removable. Thus, the worst that can happen is a recovery conflict. Fixes BUG# 17257 Back-patched in versions 14-17 Author: Melanie Plageman Reviewed-by: Noah Misch, Peter Geoghegan, Robert Haas, Andres Freund, and Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
2024-07-17Avoid error in recovery test if history file is not yet presentAndrew Dunstan
Error was detected when testing use of libpq sessions instead of psql for polling queries. Discussion: https://postgr.es/m/e86b6d2d-20d8-4ac9-9a98-165fff7db886@dunslane.net Backpatch to all live branches
2024-07-15Fix bad indentation introduced in 43cd30bcd1cAndres Freund
Oops. Reported-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/ZpVZB9rH5tHllO75@nathan Backpatch: 12-, like 43cd30bcd1c
2024-07-15Fix type confusion in guc_var_compare()Andres Freund
Before this change guc_var_compare() cast the input arguments to const struct config_generic *. That's not quite right however, as the input on one side is often just a char * on one side. Instead just use char *, the first field in config_generic. This fixes a -Warray-bounds warning with some versions of gcc. While the warning is only known to be triggered for <= 15, the issue the warning points out seems real, so apply the fix everywhere. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reported-by: Erik Rijkers <er@xs4all.nl> Suggested-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/a74a1a0d-0fd2-3649-5224-4f754e8f91aa%40xs4all.nl
2024-07-14Avoid unhelpful internal error for incorrect recursive-WITH queries.Tom Lane
checkWellFormedRecursion would issue "missing recursive reference" if a WITH RECURSIVE query contained a single self-reference but that self-reference was inside a top-level WITH, ORDER BY, LIMIT, etc, rather than inside the second arm of the UNION as expected. We already intended to throw more-on-point errors for such cases, but those error checks must be done before examining the UNION arm in order to have the desired results. So this patch need only move some code (and improve the comments). Per bug #18536 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18536-0a342ec07901203e@postgresql.org
2024-07-13Don't lose partitioned table reltuples=0 after relhassubclass=f.Noah Misch
ANALYZE sets relhassubclass=f when a partitioned table no longer has partitions. An ANALYZE doing that proceeded to apply the inplace update of pg_class.reltuples to the old pg_class tuple instead of the new tuple, losing that reltuples=0 change if the ANALYZE committed. Non-partitioning inheritance trees were unaffected. Back-patch to v14, where commit 375aed36ad83f0e021e9bdd3a0034c0c992c66dc introduced maintenance of partitioned table pg_class.reltuples. Reported by Alexander Lakhin. Discussion: https://postgr.es/m/a295b499-dcab-6a99-c06e-01cf60593344@gmail.com
2024-07-13Make sure to run pg_isready on correct portAndrew Dunstan
The current code can have pg_isready unexpectedly succeed if there is a server running on the default port. To avoid this we delay running the test until after a node has been created but before it starts, and then use that node's port, so we are fairly sure there is nothing running on the port. Backpatch to all live branches.
2024-07-13Fix lost Windows socket EOF events.Thomas Munro
Winsock only signals an FD_CLOSE event once if the other end of the socket shuts down gracefully. Because each WaitLatchOrSocket() call constructs and destroys a new event handle every time, with unlucky timing we can lose it and hang. We get away with this only if the other end disconnects non-gracefully, because FD_CLOSE is repeatedly signaled in that case. To fix this design flaw in our Windows socket support fundamentally, we'd probably need to rearchitect it so that a single event handle exists for the lifetime of a socket, or switch to completely different multiplexing or async I/O APIs. That's going to be a bigger job and probably wouldn't be back-patchable. This brute force kludge closes the race by explicitly polling with MSG_PEEK before sleeping. Back-patch to all supported releases. This should hopefully clear up some random build farm and CI hang failures reported over the years. It might also allow us to try using graceful shutdown in more places again (reverted in commit 29992a6) to fix instability in the transmission of FATAL error messages, but that isn't done by this commit. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/176008.1715492071%40sss.pgh.pa.us
2024-07-12Add ORDER BY to new test queryAlvaro Herrera
Per buildfarm.
2024-07-12Fix ALTER TABLE DETACH for inconsistent indexesAlvaro Herrera
When a partitioned table has an index that doesn't support a constraint, but a partition has an equivalent index that does, then a DETACH operation would misbehave: a crash in assertion-enabled systems (because we fail to find the constraint in the parent that we expect to), or a broken coninhcount value (-1) in production systems (because we blindly believe that we've successfully detached the parent). While we should reject an ATTACH of a partition with such an index, we have failed to do so in existing releases, so adding an error in stable releases might break the (unlikely) existing applications that rely on this behavior. At this point I don't even want to reject them in master, because it'd break pg_upgrade if such databases exist, and there would be no easy way to fix existing databases without expensive index rebuilds. (Later on we could add ALTER TABLE ... ADD CONSTRAINT USING INDEX to partitioned tables, which would allow the user to fix such patterns. At that point we could add more restrictions to prevent the problem from its root.) Also, add a test case that leaves one table in this condition, so that we can verify that pg_upgrade continues to work if we later decide to change the policy on the master branch. Backpatch to all supported branches. Co-authored-by: Tender Wang <tndrwang@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18500-62948b6fe5522f56@postgresql.org
2024-07-11Fix possibility of logical decoding partial transaction changes.Masahiko Sawada
When creating and initializing a logical slot, the restart_lsn is set to the latest WAL insertion point (or the latest replay point on standbys). Subsequently, WAL records are decoded from that point to find the start point for extracting changes in the DecodingContextFindStartpoint() function. Since the initial restart_lsn could be in the middle of a transaction, the start point must be a consistent point where we won't see the data for partial transactions. Previously, when not building a full snapshot, serialized snapshots were restored, and the SnapBuild jumps to the consistent state even while finding the start point. Consequently, the slot's restart_lsn and confirmed_flush could be set to the middle of a transaction. This could lead to various unexpected consequences. Specifically, there were reports of logical decoding decoding partial transactions, and assertion failures occurred because only subtransactions were decoded without decoding their top-level transaction until decoding the commit record. To resolve this issue, the changes prevent restoring the serialized snapshot and jumping to the consistent state while finding the start point. On v17 and HEAD, a flag indicating whether snapshot restores should be skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION has been bumpded. On backbranches, the flag is stored in the LogicalDecodingContext instead, preserving on-disk compatibility. Backpatch to all supported versions. Reported-by: Drew Callahan Reviewed-by: Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com Backpatch-through: 12
2024-07-10Make our back branches compatible with libxml2 2.13.x.Tom Lane
This back-patches HEAD commits 066e8ac6e, 6082b3d5d, e7192486d, and 896cd266f into supported branches. Changes: * Use xmlAddChildList not xmlAddChild in XMLSERIALIZE (affects v16 and up only). This was a flat-out coding mistake that we got away with due to lax checking in previous versions of xmlAddChild. * Use xmlParseInNodeContext not xmlParseBalancedChunkMemory. This is to dodge a bug in xmlParseBalancedChunkMemory in libxm2 releases 2.13.0-2.13.2. While that bug is now fixed upstream and will probably never be seen in any production-oriented distro, it is currently a problem on some more-bleeding-edge-friendly platforms. * Suppress "chunk is not well balanced" errors from libxml2, unless it is the only error. This eliminates an error-reporting discrepancy between 2.13 and older releases. This error is almost always redundant with previous errors, if not flat-out inappropriate, which is why 2.13 changed the behavior and why nobody's likely to miss it. Erik Wienhold and Tom Lane, per report from Frank Streitzig. Discussion: https://postgr.es/m/trinity-b0161630-d230-4598-9ebc-7a23acdb37cb-1720186432160@3c-app-gmx-bap25 Discussion: https://postgr.es/m/trinity-361ba18b-541a-4fe7-bc63-655ae3a7d599-1720259822452@3c-app-gmx-bs01
2024-07-08Symlink pg_replslot robustly on Windows in pg_basebackup testAndrew Dunstan
This reverts commit e9f15bc9. Instead of a hacky solution that didn't work on Windows, we avoid trying to move the directory possibly across drives, and instead remove it and recreate it in the new location. Discussion: https://postgr.es/m/20240707070243.sb77kp4ubowauctz@awork3.anarazel.de Backpatch to release 14 like the previous patch.
2024-07-08Choose ports for test servers less likely to result in conflictsAndrew Dunstan
If we choose ports in the range typically used for ephemeral ports there is a danger of encountering a port conflict due to a race condition between the time we choose the port in a range below that typically used to allocate ephemeral ports, but higher than the range typically used by well known services. Author: Jelte Fenema-Nio, with some editing by me. Discussion: https://postgr.es/m/d6ee8761-39d1-0033-1afb-d5a57ee056f2@gmail.com Backpatch to all live branches (12 and up)
2024-07-08Force nodes for SSL tests to start in TCP modeAndrew Dunstan
Currently they are started in unix socket mode in ost cases, and then converted to run in TCP mode. This can result in port collisions, and there is no virtue in startng in unix socket mode, so start as we will be going on. Discussion: https://postgr.es/m/d6ee8761-39d1-0033-1afb-d5a57ee056f2@gmail.com Backpatch to all live branches (12 and up).