user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2024-12-15	pgbench: fix misprocessing of some nested \if constructs.	Tom Lane
	An \if command appearing within a false (not-to-be-executed) \if branch was incorrectly treated the same as \elif. This could allow statements within the inner \if to be executed when they should not be. Also the missing inner \if stack entry would result in an assertion failure (in assert-enabled builds) when the final \endif is reached. Report and patch by Michail Nikolaev. Back-patch to all supported branches. Discussion: https://postgr.es/m/CANtu0oiA1ke=SP6tauhNqkUdv5QFsJtS1p=aOOf_iU+EhyKkjQ@mail.gmail.com
2024-12-13	Fix possible crash in pg_dump with identity sequences.	Tom Lane
	If an owned sequence is considered interesting, force its owning table to be marked interesting too. This ensures, in particular, that we'll fetch the owning table's column names so we have the data needed for ALTER TABLE ... ADD GENERATED. Previously there were edge cases where pg_dump could get SIGSEGV due to not having filled in the column names. (The known case is where the owning table has been made part of an extension while its identity sequence is not a member; but there may be others.) Also, if it's an identity sequence, force its dumped-components mask to exactly match the owning table: dump definition only if we're dumping the table's definition, dump data only if we're dumping the table's data, etc. This generalizes the code introduced in commit b965f2617 that set the sequence's dump mask to NONE if the owning table's mask is NONE. That's insufficient to prevent failures, because for example the table's mask might only request dumping ACLs, which would lead us to still emit ALTER TABLE ADD GENERATED even though we didn't create the table. It seems better to treat an identity sequence as though it were an inseparable aspect of the table, matching the treatment used in the backend's dependency logic. Perhaps this policy needs additional refinement, but let's wait to see some field use-cases before changing it further. While here, add a comment in pg_dump.h warning against writing tests like "if (dobj->dump == DUMP_COMPONENT_NONE)", which was a bug in this case. There is one other example in getPublicationNamespaces, which if it's not a bug is at least remarkably unclear and under-documented. Changing that requires a separate discussion, however. Per report from Artur Zakirov. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAKNkYnwXFBf136=u9UqUxFUVagevLQJ=zGd5BsLhCsatDvQsKQ@mail.gmail.com
2024-12-10	Fix elog(FATAL) before PostmasterMain() or just after fork().	Noah Misch
	Since commit 97550c0711972a9856b5db751539bbaf2f88884c, these failed with "PANIC: proc_exit() called in child process" due to uninitialized or stale MyProcPid. That was reachable if close() failed in ClosePostmasterPorts() or setlocale(category, "C") failed, both unlikely. Back-patch to v13 (all supported versions). Discussion: https://postgr.es/m/20241208034614.45.nmisch@google.com
2024-12-09	Fix unused-but-set-variable compiler warning in reorderbuffer.c.	Nathan Bossart
	On v13, this variable is only used for an assertion, so adding PG_USED_FOR_ASSERTS_ONLY is sufficient to suppress this warning on builds with assertions disabled. Older versions are unsupported, and newer versions use the variable for more than the assertion, so this patch only needs to be applied to REL_13_STABLE. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/Z1dCFnzrP24O8WNR%40nathan
2024-12-09	Simplify executor's determination of whether to use parallelism.	Tom Lane
	Our parallel-mode code only works when we are executing a query in full, so ExecutePlan must disable parallel mode when it is asked to do partial execution. The previous logic for this involved passing down a flag (variously named execute_once or run_once) from callers of ExecutorRun or PortalRun. This is overcomplicated, and unsurprisingly some of the callers didn't get it right, since it requires keeping state that not all of them have handy; not to mention that the requirements for it were undocumented. That led to assertion failures in some corner cases. The only state we really need for this is the existing QueryDesc.already_executed flag, so let's just put all the responsibility in ExecutePlan. (It could have been done in ExecutorRun too, leading to a slightly shorter patch -- but if there's ever more than one caller of ExecutePlan, it seems better to have this logic in the subroutine than the callers.) This makes those ExecutorRun/PortalRun parameters unnecessary. In master it seems okay to just remove them, returning the API for those functions to what it was before parallelism. Such an API break is clearly not okay in stable branches, but for them we can just leave the parameters in place after documenting that they do nothing. Per report from Yugo Nagata, who also reviewed and tested this patch. Back-patch to all supported branches. Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
2024-12-07	Ensure that pg_amop/amproc entries depend on their lefttype/righttype.	Tom Lane
	Usually an entry in pg_amop or pg_amproc does not need a dependency on its amoplefttype/amoprighttype/amproclefttype/amprocrighttype types, because there is an indirect dependency via the argument types of its referenced operator or procedure, or via the opclass it belongs to. However, for some support procedures in some index AMs, the argument types of the support procedure might not mention the column data type at all. Also, the amop/amproc entry might be treated as "loose" in the opfamily, in which case it lacks a dependency on any particular opclass; or it might be a cross-type entry having a reference to a datatype that is not its opclass' opcintype. The upshot of all this is that there are cases where a datatype can be dropped while leaving behind amop/amproc entries that mention it, because there is no path in pg_depend showing that those entries depend on that type. Such entries are harmless in normal activity, because they won't get used, but they cause problems for maintenance actions such as dropping the operator family. They also cause pg_dump to produce bogus output. The previous commit put a band-aid on the DROP OPERATOR FAMILY failure, but a real fix is needed. To fix, add pg_depend entries showing that a pg_amop/pg_amproc entry depends on its lefttype/righttype. To avoid bloating pg_depend too much, skip this if the referenced operator or function has that type as an input type. (I did not bother with considering the possible indirect dependency via the opclass' opcintype; at least in the reported case, that wouldn't help anyway.) Probably, the reason this has escaped notice for so long is that add-on datatypes and relevant opclasses/opfamilies are usually packaged as extensions nowadays, so that there's no way to drop a type without dropping the referencing opclasses/opfamilies too. Still, in the absence of pg_depend entries there's nothing that constrains DROP EXTENSION to drop the opfamily entries before the datatype, so it seems possible for a DROP failure to occur anyway. The specific case that was reported doesn't fail in v13, because v13 prefers to attach the support procedure to the opclass not the opfamily. But it's surely possible to construct other edge cases that do fail in v13, so patch that too. Per report from Yoran Heling. Back-patch to all supported branches. Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021
2024-12-07	Make getObjectDescription robust against dangling amproc type links.	Tom Lane
	Yoran Heling reported a case where a data type could be dropped while references to its OID remain behind in pg_amproc. This causes getObjectDescription to fail, which blocks dropping the operator family (since our DROP code likes to construct descriptions of everything it's dropping). The proper fix for this requires adding more pg_depend entries. But to allow DROP to go through with already-corrupt catalogs, tweak getObjectDescription to print "???" for the type instead of failing when it processes such an entry. I changed the logic for pg_amop similarly, for consistency, although it is not known that the problem can manifest in pg_amop. Per report from Yoran Heling. Back-patch to all supported branches (although the problem may be unreachable in v13). Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021
2024-12-07	Fix is_digit labeling of to_timestamp's FFn format codes.	Tom Lane
	These format codes produce or consume strings of digits, so they should be labeled with is_digit = true, but they were not. This has effect in only one place, where is_next_separator() is checked to see if the preceding format code should slurp up all the available digits. Thus, with a format such as '...SSFF3' with remaining input '12345', the 'SS' code would consume all five digits (and then complain about seconds being out of range) when it should eat only two digits. Per report from Nick Davies. This bug goes back to d589f9446 where the FFn codes were introduced, so back-patch to v13. Discussion: https://postgr.es/m/AM8PR08MB6356AC979252CFEA78B56678B6312@AM8PR08MB6356.eurprd08.prod.outlook.com
2024-12-05	Avoid low-probability crash on out-of-memory.	Tom Lane
	check_restrict_nonsystem_relation_kind() correctly uses guc_malloc() in v16 and later. But in older branches it must use malloc() directly, and it forgot to check for failure return. Faulty backpatching of 66e94448a. Karina Litskevich Discussion: https://postgr.es/m/CACiT8iZ=atkguKVbpN4HmJFMb4+T9yEowF5JuPZG8W+kkZ9L6w@mail.gmail.com
2024-12-03	RelationTruncate() must set DELAY_CHKPT_START.	Thomas Munro
	Previously, it set only DELAY_CHKPT_COMPLETE. That was important, because it meant that if the XLOG_SMGR_TRUNCATE record preceded a XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also happen on disk before the XLOG_CHECKPOINT_ONLINE record was written. However, it didn't guarantee that the sync request for the truncation was processed before the XLOG_CHECKPOINT_ONLINE record was written. By setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE record is written to WAL before the redo pointer of a concurrent checkpoint, the sync request queued by that operation must be processed by that checkpoint, rather than being left for the following one. This is a refinement of commit 412ad7a5563. Back-patch to all supported releases, like that commit. Author: Robert Haas <robertmhaas@gmail.com> Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com
2024-12-01	Fix broken list-munging in ecpg's remove_variables().	Tom Lane
	The loops over cursor argument variables neglected to ever advance "prevvar". The code would accidentally do the right thing anyway when removing the first or second list entry, but if it had to remove the third or later entry then it would also remove all entries between there and the first entry. AFAICS this would only matter for cursors that reference out-of-scope variables, which is a weird Informix compatibility hack; between that and the lack of impact for short lists, it's not so surprising that nobody has complained. Nonetheless it's a pretty obvious bug. It would have been more obvious if these loops used a more standard coding style for chasing the linked lists --- this business with the "prev" pointer sometimes pointing at the current list entry is confusing and overcomplicated. So rather than just add a minimal band-aid, I chose to rewrite the loops in the same style we use elsewhere, where the "prev" pointer is NULL until we are dealing with a non-first entry and we save the "next" pointer at the top of the loop. (Two of the four loops touched here are not actually buggy, but it seems better to make them all look alike.) Coverity discovered this problem, but not until 2b41de4a5 added code to free no-longer-needed arguments structs. With that, the incorrect link updates are possibly touching freed memory, and it complained about that. Nonetheless the list corruption hazard is ancient, so back-patch to all supported branches.
2024-11-29	Fix MinGW %d vs %lu warnings in back branches.	Thomas Munro
	Commit 352f6f2d used %d instead of %lu to format DWORD (unsigned long) with psprintf(). The _WIN32_WINNT value recently changed for MinGW in REL_15_STABLE (commit d700e8d7), so the code was suddenly being compiled, with warnings from gcc. The warnings were already fixed in 16+ by commits 495ed0ef and a9bc04b2 after the _WIN32_WINNT value was increase there. 14 and 13 didn't warn because they still use a lower value for MinGW, and supported versions of Visual Studio should compile the code in all live branches but don't check our format string. The change doesn't affect the result: sizeof(int) == sizeof(long) on this platform, and the values are computed with expressions that cannot exceed INT_MAX so were never interpreted as negative. Back-patch the formatting change from those commits into 13-15. This should turn CI's 15 branch green again and stop fairywren from warning about that on 15. Reported-by: Andres Freund <andres@anarazel.de> Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/t2vjrcb3bloxf5qqvxjst6r7lvrefqyecxgt2koy5ho5b5glr2%40yuupmm6whgob
2024-11-28	Skip SectionMemoryManager.h in cpluspluscheck.	Thomas Munro
	Commit 9044fc1d45a0 skipped SectionMemoryManager.h in headerscheck, and by extension also cpluspluscheck, because it's C++ and would fail both tests. That worked in master and REL_17_STABLE due to 7b8e2ae2fd3b, but older branches have a separate cpluspluscheck script. We need to copy the filtering rule into there too. This problem was being reported by CI's CompilerWarnings task in the 15 and 16 branches, but was a victim of alert fatigue syndrome (unrelated problems in the back-branches). Fix 16, and back-patch to 13, as those are the live branches that have a separate cpluspluscheck script.
2024-11-28	Revert "Handle better implicit transaction state of pipeline mode"	Michael Paquier
	This reverts commit d77f91214fb7 on all stable branches, due to concerns regarding the compatility side effects this could create in a minor release. The change still exists on HEAD. Discussion: https://postgr.es/m/CA+TgmoZqRgeFTg4+Yf_CMRRXiHuNz1u6ZC4FvVk+rxw0RmOPnw@mail.gmail.com Backpatch-through: 13
2024-11-27	pgbench: Ensure previous progress message is fully cleared when updating.	Fujii Masao
	During pgbench's table initialization, progress updates could display leftover characters from the previous message if the new message was shorter. This commit resolves the issue by appending spaces to the current message to fully overwrite any remaining characters from the previous line. Back-patch to all the supported versions. Author: Yushi Ogiwara, Tatsuo Ishii, Fujii Masao Reviewed-by: Tatsuo Ishii, Fujii Masao Discussion: https://postgr.es/m/9a9b8b95b6a709877ae48ad5b0c59bb9@oss.nttdata.com
2024-11-27	Handle better implicit transaction state of pipeline mode	Michael Paquier
	When using a pipeline, a transaction starts from the first command and is committed with a Sync message or when the pipeline ends. Functions like IsInTransactionBlock() or PreventInTransactionBlock() were already able to understand a pipeline as being in a transaction block, but it was not the case of CheckTransactionBlock(). This function is called for example to generate a WARNING for SET LOCAL, complaining that it is used outside of a transaction block. The current state of the code caused multiple problems, like: - SET LOCAL executed at any stage of a pipeline issued a WARNING, even if the command was at least second in line where the pipeline is in a transaction state. - LOCK TABLE failed when invoked at any step of a pipeline, even if it should be able to work within a transaction block. The pipeline protocol assumes that the first command of a pipeline is not part of a transaction block, and that any follow-up commands is considered as within a transaction block. This commit changes the backend so as an implicit transaction block is started each time the first Execute message of a pipeline has finished processing, with this implicit transaction block ended once a sync is processed. The checks based on XACT_FLAGS_PIPELINING in the routines checking if we are in a transaction block are not necessary: it is enough to rely on the existing ones. Some tests are added to pgbench, that can be backpatched down to v17 when \syncpipeline is involved and down to v14 where \startpipeline and \endpipeline are available. This is unfortunately limited regarding the error patterns that can be checked, but it provides coverage for various pipeline combinations to check if these succeed or fail. These tests are able to capture the case of SET LOCAL's WARNING. The author has proposed a different feature to improve the coverage by adding similar meta-commands to psql where error messages could be checked, something more useful for the cases where commands cannot be used in transaction blocks, like REINDEX CONCURRENTLY or VACUUM. This is considered as future work for v18~. Author: Anthonin Bonnefoy Reviewed-by: Jelte Fennema-Nio, Michael Paquier Discussion: https://postgr.es/m/CAO6_XqrWO8uNBQrSu5r6jh+vTGi5Oiyk4y8yXDORdE2jbzw8xw@mail.gmail.com Backpatch-through: 13
2024-11-25	Fix NULLIF()'s handling of read-write expanded objects.	Tom Lane
	If passed a read-write expanded object pointer, the EEOP_NULLIF code would hand that same pointer to the equality function and then (unless equality was reported) also return the same pointer as its value. This is no good, because a function that receives a read-write expanded object pointer is fully entitled to scribble on or even delete the object, thus corrupting the NULLIF output. (This problem is likely unobservable with the equality functions provided in core Postgres, but it's easy to demonstrate with one coded in plpgsql.) To fix, make sure the pointer passed to the equality function is read-only. We can still return the original read-write pointer as the NULLIF result, allowing optimization of later operations. Per bug #18722 from Alexander Lakhin. This has been wrong since we invented expanded objects, so back-patch to all supported branches. Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org
2024-11-25	Avoid "you don't own a lock of type ExclusiveLock" in GRANT TABLESPACE.	Noah Misch
	This WARNING appeared because SearchSysCacheLocked1() read cc_relisshared before catcache initialization, when the field is false unconditionally. On the basis of reading false there, it constructed a locktag as though pg_tablespace weren't relisshared. Only shared catalogs could be affected, and only GRANT TABLESPACE was affected in practice. SearchSysCacheLocked1() callers use one other shared-relation syscache, DATABASEOID. DATABASEOID is initialized by the end of CheckMyDatabase(), making the problem unreachable for pg_database. Back-patch to v13 (all supported versions). This has no known impact before v16, where ExecGrant_common() first appeared. Earlier branches avoid trouble by having a separate ExecGrant_Tablespace() that doesn't use LOCKTAG_TUPLE. However, leaving this unfixed in v15 could ensnare a future back-patch of a SearchSysCacheLocked1() call. Reported by Aya Iwata. Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com
2024-11-25	Add support for Tcl 9	Peter Eisentraut
	Tcl 9 changed several API functions to take Tcl_Size, which is ptrdiff_t, instead of int, for 64-bit enablement. We have to change a few local variables to be compatible with that. We also provide a fallback typedef of Tcl_Size for older Tcl versions. The affected variables are used for quantities that will not approach values beyond the range of int, so this doesn't change any functionality. Reviewed-by: Tristan Partin <tristan@partin.io> Discussion: https://www.postgresql.org/message-id/flat/bce0fe54-75b4-438e-b42b-8e84bc7c0e9c%40eisentraut.org
2024-11-25	Assume that <stdbool.h> conforms to the C standard.	Thomas Munro
	Previously we checked "for <stdbool.h> that conforms to C99" using autoconf's AC_HEADER_STDBOOL macro. We've required C99 since PostgreSQL 12, so the test was redundant, and under C23 it was broken: autoconf 2.69's implementation doesn't understand C23's new empty header (the macros it's looking for went away, replaced by language keywords). Later autoconf versions fixed that, but let's just remove the anachronistic test. HAVE_STDBOOL_H and HAVE__BOOL will no longer be defined, but they weren't directly tested in core or likely extensions (except in 11, see below). PG_USE_STDBOOL (or USE_STDBOOL in 11 and 12) is still defined when sizeof(bool) is 1, which should be true on all modern systems. Otherwise we define our own bool type and values of size 1, which would fail to compile under C23 as revealed by the broken test. (We'll probably clean that dead code up in master, but here we want a minimal back-patchable change.) This came to our attention when GCC 15 recently started using using C23 by default and failed to compile the replacement code, as reported by Sam James and build farm animal alligator. Back-patch to all supported releases, and then two older versions that also know about <stdbool.h>, per the recently-out-of-support policy[1]. 12 requires C99 so it's much like the supported releases, but 11 only assumes C89 so it now uses AC_CHECK_HEADERS instead of the overly picky AC_HEADER_STDBOOL. (I could find no discussion of which historical systems had <stdbool.h> but failed the conformance test; if they ever existed, they surely aren't relevant to that policy's goals.) [1] https://wiki.postgresql.org/wiki/Committing_checklist#Policies Reported-by: Sam James <sam@gentoo.org> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> (master version) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (approach) Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org
2024-11-21	Fix outdated bit in README.tuplock	Álvaro Herrera
	Apparently this information has been outdated since first committed, because we adopted a different implementation during development per reviews and this detail was not updated in the README. This has been wrong since commit 0ac5ad5134f2 introduced the file in 2013. Backpatch to all live branches. Reported-by: Will Mortensen <will@extrahop.com> Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com
2024-11-20	Avoid assertion failure if a setop leaf query contains setops.	Tom Lane
	Ordinarily transformSetOperationTree will collect all UNION/ INTERSECT/EXCEPT steps into the setOperations tree of the topmost Query, so that leaf queries do not contain any setOperations. However, it cannot thus flatten a subquery that also contains WITH, ORDER BY, FOR UPDATE, or LIMIT. I (tgl) forgot that in commit 07b4c48b6 and wrote an assertion in rule deparsing that a leaf's setOperations would always be empty. If it were nonempty then we would want to parenthesize the subquery to ensure that the output represents the setop nesting correctly (e.g. UNION below INTERSECT had better get parenthesized). So rather than just removing the faulty Assert, let's change it into an additional case to check to decide whether to add parens. We don't expect that the additional case will ever fire, but it's cheap insurance. Man Zeng and Tom Lane Discussion: https://postgr.es/m/tencent_7ABF9B1F23B0C77606FC5FE3@qq.com
2024-11-19	Compare collations before merging UNION operations.	Tom Lane
	In the dim past we figured it was okay to ignore collations when combining UNION set-operation nodes into a single N-way UNION operation. I believe that was fine at the time, but it stopped being fine when we added nondeterministic collations: the semantics of distinct-ness are affected by those. v17 made it even less fine by allowing per-child sorting operations to be merged via MergeAppend, although I think we accidentally avoided any live bug from that. Add a check that collations match before deciding that two UNION nodes are equivalent. I also failed to resist the temptation to comment plan_union_children() a little better. Back-patch to all supported branches (v13 now), since they all have nondeterministic collations. Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us
2024-11-17	Fix recently-exposed portability issue in regex optimization.	Tom Lane
	fixempties() counts the number of in-arcs in the regex NFA and then allocates an array of that many arc pointers. If the NFA contains no arcs, this amounts to malloc(0) for which some platforms return NULL. The code mistakenly treats that as indicating out-of-memory. Thus, we can get a bogus "out of memory" failure for some unsatisfiable regexes. This happens only in v15 and earlier, since bea3d7e38 switched to using palloc() rather than bare malloc(). And at least of the platforms in the buildfarm, only AIX seems to return NULL. So the impact is pretty narrow. But I don't especially want to ship code that is failing its own regression tests, so let's fix this for this week's releases. A quick code survey says that there is only the one place in src/backend/regex/ that is at risk of doing malloc(0), so we'll just band-aid that place. A more future-proof fix could be to install a malloc() wrapper similar to pg_malloc(). But this code seems unlikely to change much more in the affected branches, so that's probably overkill. The only known test case for this involves a complemented character class in a bracket expression, for example [^\s\S], and we didn't support that in v13. So it may be that the problem is unreachable in v13. But I'm not 100% sure of that, so patch v13 too. Discussion: https://postgr.es/m/661fd81b-f069-8f30-1a41-e195c57449b4@gmail.com
2024-11-15	Fix per-session activation of ALTER {ROLE\|DATABASE} SET role.	Noah Misch
	After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state resulting from these commands ceased to affect sessions. Restore the longstanding behavior, which is like beginning the session with a SET ROLE command. If cherry-picking the CVE-2024-10978 fixes, default to including this, too. (This fixes an unintended side effect of fixing CVE-2024-10978.) Back-patch to v12, like that commit. The release team decided to include v12, despite the original intent to halt v12 commits earlier this week. Tom Lane and Noah Misch. Reported by Etienne LAFARGE. Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
2024-11-15	Fix a possibility of logical replication slot's restart_lsn going backwards.	Masahiko Sawada
	Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13
2024-11-12	Fix arrays comparison in CompareOpclassOptions()	Alexander Korotkov
	The current code calls array_eq() and does not provide FmgrInfo. This commit provides initialization of FmgrInfo and uses C collation as the safe option for text comparison because we don't know anything about the semantics of opclass options. Backpatch to 13, where opclass options were introduced. Reported-by: Nicolas Maus Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org Backpatch-through: 13
2024-11-11	Parallel workers use AuthenticatedUserId for connection privilege checks.	Tom Lane
	Commit 5a2fed911 had an unexpected side-effect: the parallel worker launched for the new test case would fail if it couldn't use a superuser-reserved connection slot. The reason that test failed while all our pre-existing ones worked is that the connection privilege tests in InitPostgres had been based on the superuserness of the leader's AuthenticatedUserId, but after the rearrangements of 5a2fed911 we were testing the superuserness of CurrentUserId, which the new test case deliberately made to be a non-superuser. This all seems very accidental and probably not the behavior we really want, but a security patch is no time to be redesigning things. Pending some discussion about desirable semantics, hack it so that InitPostgres continues to pay attention to the superuserness of AuthenticatedUserId when starting a parallel worker. Nathan Bossart and Tom Lane, per buildfarm member sawshark. Security: CVE-2024-10978
2024-11-11	Fix cross-version upgrade tests.	Tom Lane
	TestUpgradeXversion knows how to make the main regression database's references to pg_regress.so be version-independent. But it doesn't do that for plperl's database, so that the C function added by commit b7e3a52a8 is causing cross-version upgrade test failures. Path of least resistance is to just drop the function at the end of the new test. In <= v14, also take the opportunity to clean up the generated test files. Security: CVE-2024-10979
2024-11-11	src/tools/msvc: Respect REGRESS_OPTS in plcheck.	Noah Misch
	v16 commit 8fe3e697a1a83a722b107c7cb9c31084e1f4d077 used REGRESS_OPTS in a way needing this. That broke "vcregress plcheck". Back-patch v16..v12; newer versions don't have this build system.
2024-11-11	Add needed .gitignore files in back branches.	Tom Lane
	v14 and earlier use generated test files, which require being .gitignore'd to avoid git complaints when testing in-tree. Security: CVE-2024-10979
2024-11-11	Fix improper interactions between session_authorization and role.	Tom Lane
	The SQL spec mandates that SET SESSION AUTHORIZATION implies SET ROLE NONE. We tried to implement that within the lowest-level functions that manipulate these settings, but that was a bad idea. In particular, guc.c assumes that it doesn't matter in what order it applies GUC variable updates, but that was not the case for these two variables. This problem, compounded by some hackish attempts to work around it, led to some security-grade issues: * Rolling back a transaction that had done SET SESSION AUTHORIZATION would revert to SET ROLE NONE, even if that had not been the previous state, so that the effective user ID might now be different from what it had been. * The same for SET SESSION AUTHORIZATION in a function SET clause. * If a parallel worker inspected current_setting('role'), it saw "none" even when it should see something else. Also, although the parallel worker startup code intended to cope with the current role's pg_authid row having disappeared, its implementation of that was incomplete so it would still fail. Fix by fully separating the miscinit.c functions that assign session_authorization from those that assign role. To implement the spec's requirement, teach set_config_option itself to perform "SET ROLE NONE" when it sets session_authorization. (This is undoubtedly ugly, but the alternatives seem worse. In particular, there's no way to do it within assign_session_authorization without incompatible changes in the API for GUC assign hooks.) Also, improve ParallelWorkerMain to directly set all the relevant user-ID variables instead of relying on some of them to get set indirectly. That allows us to survive not finding the pg_authid row during worker startup. In v16 and earlier, this includes back-patching 9987a7bf3 which fixed a violation of GUC coding rules: SetSessionAuthorization is not an appropriate place to be throwing errors from. Security: CVE-2024-10978
2024-11-11	Ensure cached plans are correctly marked as dependent on role.	Nathan Bossart
	If a CTE, subquery, sublink, security invoker view, or coercion projection references a table with row-level security policies, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Reported-by: Wolfgang Walther Reviewed-by: Noah Misch Security: CVE-2024-10976 Backpatch-through: 12
2024-11-11	Block environment variable mutations from trusted PL/Perl.	Noah Misch
	Many process environment variables (e.g. PATH), bypass the containment expected of a trusted PL. Hence, trusted PLs must not offer features that achieve setenv(). Otherwise, an attacker having USAGE privilege on the language often can achieve arbitrary code execution, even if the attacker lacks a database server operating system user. To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just replaces each modification attempt with a warning. Sites that reach these warnings should evaluate the application-specific implications of proceeding without the environment modification: Can the application reasonably proceed without the modification? If no, switch to plperlu or another approach. If yes, the application should change the code to stop attempting environment modifications. If that's too difficult, add "untie %main::ENV" in any code executed before the warning. For example, one might add it to the start of the affected function or even to the plperl.on_plperl_init setting. In passing, link to Perl's guidance about the Perl features behind the security posture of PL/Perl. Back-patch to v12 (all supported versions). Andrew Dunstan and Noah Misch Security: CVE-2024-10979
2024-11-11	Translation updates	Peter Eisentraut
	Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: be7f3c3a26b382c9d7c9d32c7a972e452b56f529
2024-11-11	libpq: Bail out during SSL/GSS negotiation errors	Michael Paquier
	This commit changes libpq so that errors reported by the backend during the protocol negotiation for SSL and GSS are discarded by the client, as these may include bytes that could be consumed by the client and write arbitrary bytes to a client's terminal. A failure with the SSL negotiation now leads to an error immediately reported, without a retry on any other methods allowed, like a fallback to a plaintext connection. A failure with GSS discards the error message received, and we allow a fallback as it may be possible that the error is caused by a connection attempt with a pre-11 server, GSS encryption having been introduced in v12. This was a problem only with v17 and newer versions; older versions discard the error message already in this case, assuming a failure caused by a lack of support for GSS encryption. Author: Jacob Champion Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier Security: CVE-2024-10977 Backpatch-through: 12
2024-11-08	Improve fix for not entering parallel mode when holding interrupts.	Tom Lane
	Commit ac04aa84a put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by ac04aa84a serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as ac04aa84a was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
2024-11-08	Disallow partitionwise join when collations don't match	Amit Langote
	If the collation of any join key column doesn’t match the collation of the corresponding partition key, partitionwise joins can yield incorrect results. For example, rows that would match under the join key collation might be located in different partitions due to the partitioning collation. In such cases, a partitionwise join would yield different results from a non-partitionwise join, so disallow it in such cases. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12
2024-11-08	Disallow partitionwise grouping when collations don't match	Amit Langote
	If the collation of any grouping column doesn’t match the collation of the corresponding partition key, partitionwise grouping can yield incorrect results. For example, rows that would be grouped under the grouping collation may end up in different partitions under the partitioning collation. In such cases, full partitionwise grouping would produce results that differ from those without partitionwise grouping, so disallowed that. Partial partitionwise aggregation is still allowed, as the Finalize step reconciles partition-level aggregates with grouping requirements across all partitions, ensuring that the final output remains consistent. This commit also fixes group_by_has_partkey() by ensuring the RelabelType node is stripped from grouping expressions when matching them to partition key expressions to avoid false mismatches. Bug: #18568 Reported-by: Webbo Han <1105066510@qq.com> Author: Webbo Han <1105066510@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12
2024-11-08	Message style improvement	Peter Eisentraut
	Backpatch the part of edee0c621de that applies to a90bdd7a44d, which was also backpatched. That way, the message is consistent in all branches.
2024-11-08	Fix lstat() for broken junction points on Windows.	Thomas Munro
	When using junction points to emulate symlinks on Windows, one edge case was not handled correctly by commit c5cb8f3b: if a junction point is broken (pointing to a non-existent path), we'd report ENOENT. This doesn't break any known use case, but was noticed while developing a test suite for these functions and is fixed here for completeness. Also add translation ERROR_CANT_RESOLVE_FILENAME -> ENOENT, as that is one of the errors Windows can report for some kinds of broken paths. Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com (cherry picked from commit 387803d81d6256fcb60b9192bb5b00042442b4e3) Author: Thomas Munro <tmunro@postgresql.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Provide lstat() for Windows.	Thomas Munro
	Junction points will be reported with S_ISLNK(x.st_mode), simulating POSIX lstat(). stat() will follow pseudo-symlinks, like in POSIX (but only one level before giving up, unlike in POSIX). This completes a TODO left by commit bed90759fcb. Tested-by: Andrew Dunstan <andrew@dunslane.net> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com (cherry picked from commit c5cb8f3b770c043509b61528664bcd805e1777e6) Author: Thomas Munro <tmunro@postgresql.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Make unlink() work for junction points on Windows.	Thomas Munro
	To support harmonization of Windows and Unix code, teach our unlink() wrapper that junction points need to be unlinked with rmdir() on Windows. Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com (cherry picked from commit f357233c9db8be2a015163da8e1ab0630f444340) Author: Thomas Munro <tmunro@postgresql.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Add missing include guard to win32ntdll.h.	Thomas Munro
	Oversight in commit e2f0f8ed. Also add this file to the exclusion lists in headerscheck and cpluscpluscheck, because Unix systems don't have a header it includes. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2760528.1641929756%40sss.pgh.pa.us (cherry picked from commit af9e6331aeba149c93052c3549140082a85a3cf9) Author: Thomas Munro <tmunro@postgresql.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Check for STATUS_DELETE_PENDING on Windows.	Thomas Munro
	1. Update our open() wrapper to check for NT's STATUS_DELETE_PENDING and translate it to Unix-like errors. This is done with RtlGetLastNtStatus(), which is dynamically loaded from ntdll. A new file win32ntdll.c centralizes lookup of NT functions, in case we decide to add more in the future. 2. Remove non-working code that was trying to do something similar for stat(), and just reuse the open() wrapper code. As a side effect, stat() also gains resilience against "sharing violation" errors. 3. Since stat() is used very early in process startup, remove the requirement that the Win32 signal event has been created before pgwin32_open_handle() is reached. Instead, teach pg_usleep() to fall back to a non-interruptible sleep if reached before the signal event is available. This could be back-patched, but for now it's in master only. The problem has apparently been with us for a long time and generated only a few complaints. Proposed patches trigger it more often, which led to this investigation and fix. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com (cherry picked from commit e2f0f8ed251d02c1eda79e1ca3cb3db2681e7a86) Author: Thomas Munro <tmunro@postgresql.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Fix -Wcast-function-type warnings	Peter Eisentraut
	Three groups of issues needed to be addressed: load_external_function() and related functions returned PGFunction, even though not necessarily all callers are looking for a function of type PGFunction. Since these functions are really just wrappers around dlsym(), change to return void * just like dlsym(). In dynahash.c, we are using strlcpy() where a function with a signature like memcpy() is expected. This should be safe, as the new comment there explains, but the cast needs to be augmented to avoid the warning. In PL/Python, methods all need to be cast to PyCFunction, per Python API, but this now runs afoul of these warnings. (This issue also exists in core CPython.) To fix the second and third case, we add a new type pg_funcptr_t that is defined specifically so that gcc accepts it as a special function pointer that can be cast to any other function pointer without the warning. Also add -Wcast-function-type to the standard warning flags, subject to configure check. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/1e97628e-6447-b4fd-e230-d109cec2d584%402ndquadrant.com (cherry picked from commit de8feb1f3a23465b5737e8a8c160e8ca62f61339) Author: Peter Eisentraut <peter@eisentraut.org> Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Fix issues with Windows' stat() for files pending on deletion	Michael Paquier
	The code introduced by bed9075 to enhance the stat() implementation on Windows for file sizes larger than 4GB fails to properly detect files pending for deletion with its method based on NtQueryInformationFile() or GetFileInformationByHandleEx(), as proved by Alexander Lakhin in a custom TAP test of his own. The method used in the implementation of open() to sleep and loop when when failing on ERROR_ACCESS_DENIED (EACCES) is showing much more stability, so switch to this method. This could still lead to issues if the permission problem stays around for much longer than the timeout of 1 second used, but that should (hopefully) never happen in performance-critical paths. Still, there could be a point in increasing the timeouts for the sake of machines that handle heavy loads. Note that WIN32's open() now uses microsoft_native_stat() as it should be similar to stat() when working around issues with concurrent file deletions. I have spent some time testing this patch with pgbench in combination of the SQL functions from genfile.c, as well as running the TAP test provided on the thread with MSVC builds, and this looks much more stable than the previous method. Author: Alexander Lakhin Reviewed-by: Tom Lane, Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com Backpatch-through: 14 (cherry picked from commit 54fb8c7ddf152629021cab3ac3596354217b7d81) Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-08	Fix our Windows stat() emulation to handle file sizes > 4GB.	Tom Lane
	Hack things so that our idea of "struct stat" is equivalent to Windows' struct __stat64, allowing it to have a wide enough st_size field. Instead of relying on native stat(), use GetFileInformationByHandle(). This avoids a number of issues with Microsoft's multiple and rather slipshod emulations of stat(). We still need to jump through hoops to deal with ERROR_DELETE_PENDING, though :-( Pull the relevant support code out of dirmod.c and put it into its own file, win32stat.c. Still TODO: do we need to do something different with lstat(), rather than treating it identically to stat()? Juan José Santamaría Flecha, reviewed by Emil Iggland; based on prior work by Michael Paquier, Sergey Zubkovsky, and others Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24 Discussion: https://postgr.es/m/15858-9572469fd3b73263@postgresql.org (cherry picked from commit bed90759fcbcd72d4d06969eebab81e47326f9a2) Author: Alexandra Wang <alexandra.wang.oss@gmail.com>
2024-11-06	Monkey-patch LLVM code to fix ARM relocation bug.	Thomas Munro
	Supply a new memory manager for RuntimeDyld, to avoid crashes in generated code caused by memory placement that can overflow a 32 bit data type. This is a drop-in replacement for the llvm::SectionMemoryManager class in the LLVM library, with Michael Smith's proposed fix from https://www.github.com/llvm/llvm-project/pull/71968. We hereby slurp it into our own source tree, after moving into a new namespace llvm::backport and making some minor adjustments so that it can be compiled with older LLVM versions as far back as 12. It's harder to make it work on even older LLVM versions, but it doesn't seem likely that people are really using them so that is not investigated for now. The problem could also be addressed by switching to JITLink instead of RuntimeDyld, and that is the LLVM project's recommended solution as the latter is about to be deprecated. We'll have to do that soon enough anyway, and then when the LLVM version support window advances far enough in a few years we'll be able to delete this code. Unfortunately that wouldn't be enough for PostgreSQL today: in most relevant versions of LLVM, JITLink is missing or incomplete. Several other projects have already back-ported this fix into their fork of LLVM, which is a vote of confidence despite the lack of commit into LLVM as of today. We don't have our own copy of LLVM so we can't do exactly what they've done; instead we have a copy of the whole patched class so we can pass an instance of it to RuntimeDyld. The LLVM project hasn't chosen to commit the fix yet, and even if it did, it wouldn't be back-ported into the releases of LLVM that most of our users care about, so there is not much point in waiting any longer for that. If they make further changes and commit it to LLVM 19 or 20, we'll still need this for older versions, but we may want to resynchronize our copy and update some comments. The changes that we've had to make to our copy can be seen by diffing our SectionMemoryManager.{h,cpp} files against the ones in the tree of the pull request. Per the LLVM project's license requirements, a copy is in SectionMemoryManager.LICENSE. This should fix the spate of crash reports we've been receiving lately from users on large memory ARM systems. Back-patch to all supported releases. Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> (license aspects) Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
2024-11-02	Suppress new "may be used uninitialized" warning.	Noah Misch
	Buildfarm member mamba fails to deduce that the function never uses this variable without initializing it. Back-patch to v12, like commit b412f402d1e020c5dac94f3bf4a005db69519b99.