summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2023-11-02Be more wary about NULL values for GUC string variables.Tom Lane
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN has a NULL boot_val. Nosing around found a couple of other places that seemed insufficiently cautious about NULL string values, although those are likely unreachable in practice. Add some commentary defining the expectations for NULL values of string variables, in hopes of forestalling future additions of more such bugs. Xing Guo, Aleksander Alekseev, Tom Lane Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02Fix 003_check_guc.pl when loading modules with custom GUCsMichael Paquier
The test missed that custom GUCs need to be ignored from the list of parameters that can exist in postgresql.conf.sample. This caused the test to fail on a server where such a module is loaded, when using EXTRA_INSTALL and TEMP_CONFIG, for instance. Author: Anton A. Melnikov Discussion: https://postgr.es/m/fc5509ce-5144-4dac-8d13-21793da44fc5@postgrespro.ru Backpatch-through: 15
2023-11-01Fix function name in commentDaniel Gustafsson
The name of the function resulting from the macro expansion was incorrectly stated. Backpatch to 16 where it was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20231101.172308.1740861597185391383.horikyota.ntt@gmail.com Backpatch-through: v16
2023-10-31doc: 1-byte varlena headers can be used for user PLAIN storageBruce Momjian
This also updates some C comments. Reported-by: suchithjn22@gmail.com Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org Author: Laurenz Albe (doc patch) Backpatch-through: 11
2023-10-30Diagnose !indisvalid in more SQL functions.Noah Misch
pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen" class XX. The other functions succeeded on an empty index; they might have malfunctioned if the failed index build left torn I/O or other complex state. Report an ERROR in statistics functions pgstatindex, pgstatginindex, pgstathashindex, and pgstattuple. Report DEBUG1 and skip all index I/O in maintenance functions brin_desummarize_range, brin_summarize_new_values, brin_summarize_range, and gin_clean_pending_list. Back-patch to v11 (all supported versions). Discussion: https://postgr.es/m/20231001195309.a3@google.com
2023-10-29Teach pg_dump about the new pg_subscription.subrunasowner option.Tom Lane
Among numerous other oversights, commit 482675987 neglected to fix pg_dump to dump this new subscription option. Since the new default is "false" while the previous behavior corresponds to "true", this would cause legacy subscriptions to silently change behavior during dump/reload or pg_upgrade. That seems like a bad idea. Even if it was intended, failing to preserve the option once set in a new installation is certainly not OK. While here, reorder associated stanzas in pg_dump to match the field order in pg_subscription, in hopes of reducing the impression that all this code was written with the aid of a dartboard. Back-patch to v16 where this new field was added. Philip Warner (cosmetic tweaks by me) Discussion: https://postgr.es/m/20231027042539.01A3A220F0A@thebes.rime.com.au
2023-10-28Fix intra-query memory leak when a SRF returns zero rows.Tom Lane
When looping around after finding that the set-returning function returned zero rows for the current input tuple, ExecProjectSet neglected to reset either of the two memory contexts it's responsible for cleaning out. Typically this wouldn't cause much problem, because once the SRF does return at least one row, the contexts would get reset on the next call. However, if the SRF returns no rows for many input tuples in succession, quite a lot of memory could be transiently consumed. To fix, make sure we reset both contexts while looping around. Per bug #18172 from Sergei Kornilov. Back-patch to all supported branches. Discussion: https://postgr.es/m/18172-9b8c5fc1d676ded3@postgresql.org
2023-10-28Remove PHOT from our default timezone abbreviations list.Tom Lane
Debian recently decided to split out a bunch of "obsolete" timezone names into a new tzdata-legacy package, which isn't installed by default. One of these zone names is Pacific/Enderbury, and that breaks our regression tests (on --with-system-tzdata builds) because our default timezone abbreviations list defines PHOT as Pacific/Enderbury. Pacific/Enderbury got renamed to Pacific/Kanton in tzdata 2021b, so that in distros that still have this entry it's just a symlink to Pacific/Kanton anyway. So one answer would be to redefine PHOT as Pacific/Kanton. However, then things would fail if the installed tzdata predates 2021b, which is recent enough that that seems like a real problem. Instead, let's just remove PHOT from the default list. That seems likely to affect nobody in the real world, because (a) it was an abbreviation that the tzdb crew made up in the first place, with no evidence of real-world usage, and (b) the total human population of the Phoenix Islands is less than two dozen persons, per Wikipedia. If anyone does use this zone abbreviation they can easily put it back via a custom abbreviations file. We'll keep PHOT in the Pacific.txt reference file, but change it to Pacific/Kanton there, as that definition seems more likely to be useful to future readers of that file. Per report from Victor Wagner. Back-patch to all supported branches. Discussion: https://postgr.es/m/20231027152049.4b5c8044@wagner.wagner.home
2023-10-27Fix minmax-multi distance for extreme interval valuesTomas Vondra
When calculating distance for interval values, the code mostly mimicked interval_mi, i.e. it built a new interval value for the difference. That however does not work for sufficiently distant interval values, when the difference overflows the interval range. Instead, we can calculate the distance directly, without constructing the intermediate (and unnecessary) interval value. Backpatch to 14, where minmax-multi indexes were introduced. Reported-by: Dean Rasheed Reviewed-by: Ashutosh Bapat, Dean Rasheed Backpatch-through: 14 Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27Fix minmax-multi on infinite date/timestamp valuesTomas Vondra
Make sure that infinite values in date/timestamp columns are treated as if in infinite distance. Infinite values should not be merged with other values, leaving them as outliers. The code however returned distance 0 in this case, so that infinite values were merged first. While this does not break the index (i.e. it still produces correct query results), it may make it much less efficient. We don't need explicit handling of infinite date/timestamp values when calculating distances, because those values are represented as extreme but regular values (e.g. INT64_MIN/MAX for the timestamp type). We don't need an exact distance, just a value that is much larger than distanced between regular values. With the added cast to double values, we can simply subtract the values. The regression test queries a value in the "gap" and checks the range was properly eliminated by the BRIN index. This only affects minmax-multi indexes on timestamp/date columns with infinite values, which is not very common in practice. The affected indexes may need to be rebuilt. Backpatch to 14, where minmax-multi indexes were introduced. Reported-by: Ashutosh Bapat Reviewed-by: Ashutosh Bapat, Dean Rasheed Backpatch-through: 14 Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27Fix calculation in brin_minmax_multi_distance_dateTomas Vondra
When calculating the distance between date values, make sure to subtract them in the right order, i.e. (larger - smaller). The distance is used to determine which values to merge, and is expected to be a positive value. The code unfortunately did the subtraction in the opposite order, i.e. (smaller - larger), thus producing negative values and merging values the most distant values first. The resulting index is correct (i.e. produces correct results), but may be significantly less efficient. This affects all minmax-multi indexes on date columns. Backpatch to 14, where minmax-multi indexes were introduced. Reported-by: Ashutosh Bapat Reviewed-by: Ashutosh Bapat, Dean Rasheed Backpatch-through: 14 Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27Fix overflow when calculating timestamp distance in BRINTomas Vondra
When calculating distances for timestamp values for BRIN minmax-multi indexes, we need to be careful about overflows for extreme values. If the value overflows into a negative value, the index may be inefficient. The new regression test checks this for the timestamp type by adding a table with enough values to force range compaction/merging. The values are close to min/max, which means a risk of overflow. Fixed by converting the int64 values to double first, before calculating the distance. This prevents the overflow. We may lose some precision, of course, but that's good enough. In the worst case we build a slightly less efficient index, but for large distances this won't matter. This only affects minmax-multi indexes on timestamp columns, with ranges containing values sufficiently distant to cause an overflow. That seems like a fairly rare case in practice. Backpatch to 14, where minmax-multi indexes were introduced. Reported-by: Ashutosh Bapat Reviewed-by: Ashutosh Bapat, Dean Rasheed Backpatch-through: 14 Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-26Avoid compiler warning in non-assert buildsAmit Langote
After 01575ad788e3, expand_single_inheritance_child()'s parentOID variable is read only in an Assert, provoking a compiler warning in non-assert builds. Fix that by marking the variable with PG_USED_FOR_ASSERTS_ONLY. Per report and suggestion from David Rowley Discussion: https://postgr.es/m/CAApHDvpjA_8Wxu4DCTRVAvPxC9atwMe6N%2ByvrcGsgb7mrfdpJA%40mail.gmail.com
2023-10-26Prevent duplicate RTEPermissionInfo for plain-inheritance parentsAmit Langote
Currently, expand_single_inheritance_child() doesn't reset perminfoindex in a plain-inheritance parent's child RTE, because prior to 387f9ed0a0, the executor would use the first child RTE to locate the parent's RTEPermissionInfo. That in turn causes add_rte_to_flat_rtable() to create an extra RTEPermissionInfo belonging to the parent's child RTE with the same content as the one belonging to the parent's original ("root") RTE. In 387f9ed0a0, we changed things so that the executor can now use the parent's "root" RTE for locating its RTEPermissionInfo instead of the child RTE, so the latter's perminfoindex need not be set anymore, so make it so. Reported-by: Tom Lane Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us Backpatch-through: 16
2023-10-24Fix problems when a plain-inheritance parent table is excluded.Tom Lane
When an UPDATE/DELETE/MERGE's target table is an old-style inheritance tree, it's possible for the parent to get excluded from the plan while some children are not. (I believe this is only possible if we can prove that a CHECK ... NO INHERIT constraint on the parent contradicts the query WHERE clause, so it's a very unusual case.) In such a case, ExecInitModifyTable mistakenly concluded that the first surviving child is the target table, leading to at least two bugs: 1. The wrong table's statement-level triggers would get fired. 2. In v16 and up, it was possible to fail with "invalid perminfoindex 0 in RTE with relid nnnn" due to the child RTE not having permissions data included in the query plan. This was hard to reproduce reliably because it did not occur unless the update triggered some non-HOT index updates. In v14 and up, this is easy to fix by defining ModifyTable.rootRelation to be the parent RTE in plain inheritance as well as partitioned cases. While the wrong-triggers bug also appears in older branches, the relevant code in both the planner and executor is quite a bit different, so it would take a good deal of effort to develop and test a suitable patch. Given the lack of field complaints about the trigger issue, I'll desist for now. (Patching v11 for this seems unwise anyway, given that it will have no more releases after next month.) Per bug #18147 from Hans Buschmann. Amit Langote and Tom Lane Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-22Fix min_dynamic_shared_memory on Windows.Thomas Munro
When min_dynamic_shared_memory is set above 0, we try to find space in a pre-allocated region of the main shared memory area instead of calling dsm_impl_XXX() routines to allocate more. The dsm_pin_segment() and dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX() routines even for main region segments. Nobody noticed before now because those routines do nothing on Unix, but on Windows they'd fail while attempting to duplicate an invalid Windows HANDLE. Add the missing gating. Back-patch to 14, where commit 84b1c63a added this feature. Fixes pgsql-bugs bug #18165. Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org
2023-10-20Make some error strings more genericAlvaro Herrera
It's undesirable to have SQL commands or configuration options in a translatable error string, so take some of these out.
2023-10-20Dodge a compiler bug affecting timetz_zone/timetz_izone.Tom Lane
This avoids a compiler bug occurring in AIX's xlc, even in pretty late-model revisions. Buildfarm testing has now confirmed that only 64-bit xlc is affected. Although we are contemplating dropping support for xlc in v17, it's still supported in the back branches, so we need this fix. Back-patch of code changes from HEAD commit 19fa97731. (The test cases were already back-patched, in 4a427b82c et al.) Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
2023-10-18Improve pglz_decompress's defenses against corrupt compressed data.Tom Lane
When processing a match tag, check to see if the claimed "off" is more than the distance back to the output buffer start. If it is, then the data is corrupt, and what's more we would fetch from outside the buffer boundaries and potentially incur a SIGSEGV. (Although the odds of that seem relatively low, given that "off" can't be more than 4K.) Back-patch to v13; before that, this function wasn't really trying to protect against bad data. Report and fix by Flavien Guedez. Discussion: https://postgr.es/m/01fc0593-e31e-463d-902c-dd43174acee2@oopacity.net
2023-10-19jit: Changes for LLVM 17.Thomas Munro
Changes required by https://llvm.org/docs/NewPassManager.html. Back-patch to 12, leaving the final release of 11 unchanged, consistent with earlier decision not to back-patch LLVM 16 support either. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BWXznXCyTgCADd%3DHWkP9Qksa6chd7L%3DGCnZo-MBgg9Lg%40mail.gmail.com
2023-10-19jit: Supply LLVMGlobalGetValueType() for LLVM < 8.Thomas Munro
Commit 37d5babb used this C API function while adding support for LLVM 16 and opaque pointers, but it's not available in LLVM 7 and older. Provide it in our own llvmjit_wrap.cpp. It just calls a C++ function that pre-dates LLVM 3.9, our minimum target. Back-patch to 12, like 37d5babb. Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
2023-10-18jit: Support opaque pointers in LLVM 16.Thomas Munro
Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
2023-10-18pg_upgrade: Fix test name in 002_pg_upgrade.plMichael Paquier
Author: Hou Zhijie Discussion: https://postgr.es/m/TYAPR01MB5724A40D47E71F4717357EC694D5A@TYAPR01MB5724.jpnprd01.prod.outlook.com Backpatch-through: 15
2023-10-18Count write times when extending relation files for shared buffersMichael Paquier
Relation files extended multiple blocks at a time have been counting the number of blocks written, but forgot that to increment the write time in this case, as write and relation extension are treated as two different I/O operations in the shared stats: IOOP_EXTEND vs IOOP_WRITE. In this case IOOP_EXTEND was forgotten for normal (non-temporary) relations. Write times are tracked when track_io_timing is enabled, which is not the case by default. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com Backpatch-through: 16
2023-10-17Back-patch test cases for timetz_zone/timetz_izone.Tom Lane
Per code coverage reports, we had zero regression test coverage of these functions. That came back to bite us, as apparently that's allowed us to miss discovering misbehavior of this code with AIX's xlc compiler. Install relevant portions of the test cases added in 97957fdba, 2f0472030, 19fa97731. (Assuming the expected outcome that the xlc problem does appear in back branches, a code fix will follow.) Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
2023-10-17Avoid calling proc_exit() in processes forked by system().Nathan Bossart
The SIGTERM handler for the startup process immediately calls proc_exit() for the duration of the restore_command, i.e., a call to system(). This system() call forks a new process to execute the shell command, and this child process inherits the parent's signal handlers. If both the parent and child processes receive SIGTERM, both will attempt to call proc_exit(). This can end badly. For example, both processes will try to remove themselves from the PGPROC shared array. To fix this problem, this commit adds a check in StartupProcShutdownHandler() to see whether MyProcPid == getpid(). If they match, this is the parent process, and we can proc_exit() like before. If they do not match, this is a child process, and we just emit a message to STDERR (in a signal safe manner) and _exit(), thereby skipping any problematic exit callbacks. This commit also adds checks in proc_exit(), ProcKill(), and AuxiliaryProcKill() that verify they are not being called within such child processes. Suggested-by: Andres Freund Reviewed-by: Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 11
2023-10-16Ensure we have a snapshot while dropping ON COMMIT DROP temp tables.Tom Lane
Dropping a temp table could entail TOAST table access to clean out toasted catalog entries, such as large pg_constraint.conbin strings for complex CHECK constraints. If we did that via ON COMMIT DROP, we triggered the assertion in init_toast_snapshot(), because there was no provision for setting up a snapshot for the drop actions. Fix that. (I assume here that the adjacent truncation actions for ON COMMIT DELETE ROWS don't have a similar problem: it doesn't seem like nontransactional truncations would need to touch any toasted fields. If that proves wrong, we could refactor a bit to have the same snapshot acquisition cover that too.) The test case added here does not fail before v15, because that assertion was added in 277692220 which was not back-patched. However, the race condition the assertion warns of surely exists further back, so back-patch to all supported branches. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-x26=_QxxgdJyNbiCDzvtr2WV5ZDso_v-CukKEe6cBZw@mail.gmail.com
2023-10-16Move extra code out of the Pre/PostRestoreCommand() section.Nathan Bossart
If SIGTERM is received within this section, the startup process will immediately proc_exit() in the signal handler, so it is inadvisable to include any more code than is required there (as such code is unlikely to be compatible with doing proc_exit() in a signal handler). This commit moves the code recently added to this section (see 1b06d7bac9 and 7fed801135) to outside of the section. This ensures that the startup process only calls proc_exit() in its SIGTERM handler for the duration of the system() call, which is how this code worked from v8.4 to v14. Reported-by: Michael Paquier, Thomas Munro Analyzed-by: Andres Freund Suggested-by: Tom Lane Reviewed-by: Michael Paquier, Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 15
2023-10-16Try to handle torn reads of pg_control in frontend.Thomas Munro
Some of our src/bin tools read the control file without any kind of interlocking against concurrent writes from the server. At least ext4 and ntfs can expose partially modified contents when you do that. For now, we'll try to tolerate this by retrying up to 10 times if the checksum doesn't match, until we get two reads in a row with the same bad checksum. This is not guaranteed to reach the right conclusion, but it seems very likely to. Thanks to Tom Lane for this suggestion. Various ideas for interlocking or atomicity were considered too complicated, unportable or expensive given the lack of field reports, but remain open for future reconsideration. Back-patch as far as 12. It doesn't seem like a good idea to put a heuristic change for a very rare problem into the final release of 11. Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
2023-10-16Fix comment from commit 22655aa231.Thomas Munro
Per automated complaint from BF animal koel this needed to be re-indented, but there was also a typo. Back-patch to 16.
2023-10-16Acquire ControlFileLock in relevant SQL functions.Thomas Munro
Commit dc7d70ea added functions that read the control file, but didn't acquire ControlFileLock. With unlucky timing, file systems that have weak interlocking like ext4 and ntfs could expose partially overwritten contents, and the checksum would fail. Back-patch to all supported releases. Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
2023-10-14Dissociate btequalimage() from interval_ops, ending its deduplication.Noah Misch
Under interval_ops, some equal values are distinguishable. One such pair is '24:00:00' and '1 day'. With that being so, btequalimage() breaches the documented contract for the "equalimage" btree support function. This can cause incorrect results from index-only scans. Users should REINDEX any btree indexes having interval-type columns. After updating, pg_amcheck will report an error for almost all such indexes. This fix makes interval_ops simply omit the support function, like numeric_ops does. Back-pack to v13, where btequalimage() first appeared. In back branches, for the benefit of old catalog content, btequalimage() code will return false for type "interval". Going forward, back-branch initdb will include the catalog change. Reviewed by Peter Geoghegan. Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
2023-10-14Don't spuriously report FD_SETSIZE exhaustion on Windows.Noah Misch
Starting on 2023-08-03, this intermittently terminated a "pgbench -C" test in CI. It could affect a high-client-count "pgbench" without "-C". While parallel reindexdb and vacuumdb reach the same problematic check, sufficient client count and/or connection turnover is less plausible for them. Given the lack of examples from the buildfarm or from manual builds, reproducing this must entail rare operating system configurations. Also correct the associated error message, which was wrong for non-Windows. Back-patch to v12, where the pgbench check first appeared. While v11 vacuumdb has the problematic check, reaching it with typical vacuumdb usage is implausible. Reviewed by Thomas Munro. Discussion: https://postgr.es/m/CA+hUKG+JwvTNdcyJTriy9BbtzF1veSRQ=9M_ZKFn9_LqE7Kp7Q@mail.gmail.com
2023-10-13Fix bulk table extension when copying into multiple partitionsAndres Freund
When COPYing into a partitioned table that does now permit the use of table_multi_insert(), we could error out with ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes because BulkInsertState->next_free was not reset between partitions. This problem occurred only when not able to use table_multi_insert(), as a dedicated BulkInsertState for each partition is used in that case. The bug was introduced in 00d1e02be24, but it was hard to hit at that point, as commonly bulk relation extension is not used when not using table_multi_insert(). It became more likely after 82a4edabd27, which expanded the use of bulk extension. To fix the bug, reset the bulk relation extension state in BulkInsertState in ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fcf) to tackle a very similar issue. Obviously the name is not quite correct, but there might be external callers, and bulk insert state needs to be reset in precisely in the situations that ReleaseBulkInsertStatePin() already needed to be called. Medium term the better fix likely is to disallow reusing BulkInsertState across relations. Add a test that, without the fix, reproduces #18130 in most configurations. The test also catches the problem fixed in b1ecb9b3fcf when run with small shared_buffers. Reported-by: Ivan Kolombet <enderstd@gmail.com> Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us> Analyzed-by: Andres Freund <andres@anarazel.de> Bug: #18130 Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us Backpatch: 16-
2023-10-13Fix runtime partition pruning for HASH partitioned tablesDavid Rowley
This could only affect HASH partitioned tables with at least 2 partition key columns. If partition pruning was delayed until execution and the query contained an IS NULL qual on one of the partitioned keys, and some subsequent partitioned key was being compared to a non-Const, then this could result in a crash due to the incorrect keyno being used to calculate the stateidx for the expression evaluation code. Here we fix this by properly skipping partitioned keys which have a nullkey set. Effectively, this must be the same as what's going on inside perform_pruning_base_step(). Sergei Glukhov also provided a patch, but that's not what's being used here. Reported-by: Sergei Glukhov Reviewed-by: tender wang, Sergei Glukhov Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru Backpatch-through: 11, where runtime partition pruning was added.
2023-10-12Fix incorrect step generation in HASH partition pruningDavid Rowley
get_steps_using_prefix_recurse() incorrectly assumed that it could stop recursive processing of the 'prefix' list when cur_keyno was one before the step_lastkeyno. Since hash partition pruning can prune using IS NULL quals, and these IS NULL quals are not present in the 'prefix' list, then that logic could cause more levels of recursion than what is needed and lead to there being no more items in the 'prefix' list to process. This would manifest itself as a crash in some code that expected the 'start' ListCell not to be NULL. Here we adjust the logic so that instead of stopping recursion at 1 key before the step_lastkeyno, we just look at the llast(prefix) item and ensure we only recursively process up until just before whichever the last key is. This effectively allows keys to be missing in the 'prefix' list. This change does mean that step_lastkeyno is no longer needed, so we remove that from the static functions. I also spent quite some time reading this code and testing it to try to convince myself that there are no other issues. That resulted in the irresistible temptation of rewriting some comments, many of which were just not true or inconcise. Reported-by: Sergei Glukhov Reviewed-by: Sergei Glukhov, tender wang Discussion: https://postgr.es/m/2f09ce72-315e-2a33-589a-8519ada8df61@postgrespro.ru Backpatch-through: 11, where partition pruning was introduced.
2023-10-10Fix bug in GenericXLogFinish().Jeff Davis
Mark the buffers dirty before writing WAL. Discussion: https://postgr.es/m/25104133-7df8-cae3-b9a2-1c0aaa1c094a@iki.fi Reviewed-by: Heikki Linnakangas Backpatch-through: 11
2023-10-09Strip off ORDER BY/DISTINCT aggregate pathkeys in create_agg_pathDavid Rowley
1349d2790 added code to adjust the PlannerInfo.group_pathkeys so that ORDER BY / DISTINCT aggregate functions could obtain pre-sorted inputs to allow faster execution. That commit forgot to adjust the pathkeys in create_agg_path(). Some code in there assumed that it was always fine to make the AggPath's pathkeys the same as its subpath's. That seems to have been ok up until 1349d2790, but since that commit adds pathkeys for columns which are within the aggregate function, those columns won't be available above the aggregate node. This can result in "could not find pathkey item to sort" during create_plan(). The fix here is to strip off the additional pathkeys added by adjust_group_pathkeys_for_groupagg(). It seems that the pathkeys here will only ever be group_pathkeys, so all we need to do is check if the length of the pathkey list is longer than the num_groupby_pathkeys and get rid of the additional ones only if we see extras. Reported-by: Justin Pryzby Reviewed-by: Richard Guo Discussion: https://postgr.es/m/ZQhYYRhUxpW3PSf9%40telsasoft.com Backpatch-through: 16, where 1349d2790 was introduced
2023-10-06Remove extra parenthesis from comment.Etsuro Fujita
2023-10-05Fix memory leak in Memoize codeDavid Rowley
Ensure we switch to the per-tuple memory context to prevent any memory leaks of detoasted Datums in MemoizeHash_hash() and MemoizeHash_equal(). Reported-by: Orlov Aleksej Author: Orlov Aleksej, David Rowley Discussion: https://postgr.es/m/83281eed63c74e4f940317186372abfd%40cft.ru Backpatch-through: 14, where Memoize was added
2023-10-03Avoid memory size overflow when allocating backend activity bufferMichael Paquier
The code in charge of copying the contents of PgBackendStatus to local memory could fail on memory allocation because of an overflow on the amount of memory to use. The overflow can happen when combining a high value track_activity_query_size (max at 1MB) with a large max_connections, when both multiplied get higher than INT32_MAX as both parameters treated as signed integers. This could for example trigger with the following functions, all calling pgstat_read_current_status(): - pg_stat_get_backend_subxact() - pg_stat_get_backend_idset() - pg_stat_get_progress_info() - pg_stat_get_activity() - pg_stat_get_db_numbackends() The change to use MemoryContextAllocHuge() has been introduced in 8d0ddccec636, so backpatch down to 12. Author: Jakub Wartak Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com Backpatch-through: 12
2023-10-03Fail hard on out-of-memory failures in xlogreader.cMichael Paquier
This commit changes the WAL reader routines so as a FATAL for the backend or exit(FAILURE) for the frontend is triggered if an allocation for a WAL record decode fails in walreader.c, rather than treating this case as bogus data, which would be equivalent to the end of WAL. The key is to avoid palloc_extended(MCXT_ALLOC_NO_OOM) in walreader.c, relying on plain palloc() calls. The previous behavior could make WAL replay finish too early than it should. For example, crash recovery finishing earlier may corrupt clusters because not all the WAL available locally was replayed to ensure a consistent state. Out-of-memory failures would show up randomly depending on the memory pressure on the host, but one simple case would be to generate a large record, then replay this record after downsizing a host, as Ethan Mertz originally reported. This relies on bae868caf222, as the WAL reader routines now do the memory allocation required for a record only once its header has been fully read and validated, making xl_tot_len trustable. Making the WAL reader react differently on out-of-memory or bogus record data would require ABI changes, so this is the safest choice for stable branches. Also, it is worth noting that 3f1ce973467a has been using a plain palloc() in this code for some time now. Thanks to Noah Misch and Thomas Munro for the discussion. Like the other commit, backpatch down to 12, leaving out v11 that will be EOL'd soon. The behavior of considering a failed allocation as bogus data comes originally from 0ffe11abd3a0, where the record length retrieved from its header was not entirely trustable. Reported-by: Ethan Mertz Discussion: https://postgr.es/m/ZRKKdI5-RRlta3aF@paquier.xyz Backpatch-through: 12
2023-10-02Fix omission of column-level privileges in selective pg_restore.Tom Lane
In a selective restore, ACLs for a table should be dumped if the table is selected to be dumped. However, if the table has both table-level and column-level ACLs, only the table-level ACL was restored. This happened because _tocEntryRequired assumed that an ACL could have only one dependency (the one on its table), and punted if there was more than one. But since commit ea9125304, column-level ACLs also depend on the table-level ACL if any, to ensure correct ordering in parallel restores. To fix, adjust the logic in _tocEntryRequired to ignore dependencies on ACLs. I extended a test case in 002_pg_dump.pl so that it purports to test for this; but in fact the test passes even without the fix. That's because this bug only manifests during a selective restore, while the scenarios 002_pg_dump.pl tests include only selective dumps. Perhaps somebody would like to extend the script so that it can test scenarios including selective restore, but I'm not touching that. Euler Taveira and Tom Lane, per report from Kong Man. Back-patch to all supported branches. Discussion: https://postgr.es/m/DM4PR11MB73976902DBBA10B1D652F9498B06A@DM4PR11MB7397.namprd11.prod.outlook.com
2023-10-02Flush WAL stats in bgwriterHeikki Linnakangas
bgwriter can write out WAL, but did not flush the WAL pgstat counters, so the writes were not seen in pg_stat_wal. Back-patch to v14, where pg_stat_wal was introduced. Author: Nazir Bilal Yavuz Reviewed-by: Matthias van de Meent, Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/CAN55FZ2FPYngovZstr%3D3w1KSEHe6toiZwrurbhspfkXe5UDocg%40mail.gmail.com
2023-10-01Fix datalen calculation in tsvectorrecv().Tom Lane
After receiving position data for a lexeme, tsvectorrecv() advanced its "datalen" value by (npos+1)*sizeof(WordEntry) where the correct calculation is (npos+1)*sizeof(WordEntryPos). This accidentally failed to render the constructed tsvector invalid, but it did result in leaving some wasted space approximately equal to the space consumed by the position data. That could have several bad effects: * Disk space is wasted if the received tsvector is stored into a table as-is. * A legal tsvector could get rejected with "maximum total lexeme length exceeded" if the extra space pushes it over the MAXSTRPOS limit. * In edge cases, the finished tsvector could be assigned a length larger than the allocated size of its palloc chunk, conceivably leading to SIGSEGV when the tsvector gets copied somewhere else. The odds of a field failure of this sort seem low, though valgrind testing could probably have found this. While we're here, let's express the calculation as "sizeof(uint16) + npos * sizeof(WordEntryPos)" to avoid the type pun implicit in the "npos + 1" formulation. It's not wrong given that WordEntryPos had better be 2 bytes to avoid padding problems, but it seems clearer this way. Report and patch by Denis Erokhin. Back-patch to all supported versions. Discussion: https://postgr.es/m/009801d9f2d9$f29730c0$d7c59240$@datagile.ru
2023-10-01In COPY FROM, fail cleanly when unsupported encoding conversion is needed.Tom Lane
In recent releases, such cases fail with "cache lookup failed for function 0" rather than complaining that the conversion function doesn't exist as prior versions did. Seems to be a consequence of sloppy refactoring in commit f82de5c46. Add the missing error check. Per report from Pierre Fortin. Back-patch to v14 where the oversight crept in. Discussion: https://postgr.es/m/20230929163739.3bea46e5.pfortin@pfortin.com
2023-10-01Only evaluate default values as required when doing COPY FROMAndrew Dunstan
Commit 9f8377f7a2 was a little too eager in fetching default values. Normally this would not matter, but if the default value is not valid for the type (e.g. a varchar that's too long) it caused an unnecessary error. Complaint and fix from Laurenz Albe Backpatch to release 16. Discussion: https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.camel@cybertec.at
2023-09-30Fix briefly showing old progress stats for ANALYZE on inherited tables.Heikki Linnakangas
ANALYZE on a table with inheritance children analyzes all the child tables in a loop. When stepping to next child table, it updated the child rel ID value in the command progress stats, but did not reset the 'sample_blks_total' and 'sample_blks_scanned' counters. acquire_sample_rows() updates 'sample_blks_total' as soon as the scan starts and 'sample_blks_scanned' after processing the first block, but until then, pg_stat_progress_analyze would display a bogus combination of the new child table relid with old counter values from the previously processed child table. Fix by resetting 'sample_blks_total' and 'sample_blks_scanned' to zero at the same time that 'current_child_table_relid' is updated. Backpatch to v13, where pg_stat_progress_analyze view was introduced. Reported-by: Justin Pryzby Discussion: https://www.postgresql.org/message-id/20230122162345.GP13860%40telsasoft.com
2023-09-30Fix EvalPlanQual rechecking during MERGE.Dean Rasheed
Under some circumstances, concurrent MERGE operations could lead to inconsistent results, that varied according the plan chosen. This was caused by a lack of rowmarks on the source relation, which meant that EvalPlanQual rechecking was not guaranteed to return the same source tuples when re-running the join query. Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for all non-target relations used in MERGE, in the same way that it does for UPDATE and DELETE. Per bug #18103. Back-patch to v15, where MERGE was introduced. Dean Rasheed, reviewed by Richard Guo. Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
2023-09-29Remove environment sensitivity in pl/tcl regression test.Tom Lane
Add "-gmt 1" to our test invocations of the Tcl "clock" command, so that they do not consult the timezone environment. While it doesn't really matter which timezone is used here, it does matter that the command not fall over entirely. We've now discovered that at least on FreeBSD, "clock scan" will fail if /etc/localtime is missing. It seems worth making the test insensitive to that. Per Tomas Vondras' buildfarm animal dikkop. Thanks to Thomas Munro for the diagnosis. Discussion: https://postgr.es/m/316d304a-1dcd-cea1-3d6c-27f794727a06@enterprisedb.com