summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2023-12-25Enhance checkpointer restartpoint statisticsAlexander Korotkov
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
2023-12-21Fix numerous typos in incremental backup commits.Robert Haas
Apparently, spell check would have been a really good idea. Alexander Lakhin, with a few additions as per an off-list report from Andres Freund. Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com
2023-12-20Add support for incremental backup.Robert Haas
To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20Add a new WAL summarizer process.Robert Haas
When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-19Move src/bin/pg_verifybackup/parse_manifest.c into src/common.Robert Haas
This makes it possible for the code to be easily reused by other client-side tools, and/or by the server. Patch by me. Review of this patch in particular by at least Peter Eisentraut; reviewers for the patch series in general include Dilip Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
2023-12-19Prevent integer overflow when forming tuple width estimates.Tom Lane
It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
2023-12-19Update comment for Cardinality typedefPeter Eisentraut
Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-Zd7Yy80RL1NdskLLo-wz6QoqsbC5TKs%3D3yZxG3BT_aA%40mail.gmail.com
2023-12-19Simplify newNode() by removing special casesHeikki Linnakangas
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
2023-12-18compute_bitmap_pages' loop_count parameter should be double not int.Tom Lane
The value was double in the original implementation of this logic. Commit da08a6598 pulled it out into a subroutine, but carelessly declared the parameter as int when it should have been double. On most platforms, the only ill effect would be to clamp the value to be not more than INT_MAX, which would seldom be exceeded and probably wouldn't change the estimates too much anyway. Nonetheless, it's wrong and can cause complaints from ubsan. While here, improve the comments and parameter names. This is an ABI change in a globally exposed subroutine, so back-patching would create some risk of breaking extensions. The value of the fix doesn't seem high enough to warrant taking that risk, so fix in HEAD only. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
2023-12-18Optimize pg_atomic_exchange_u32 and pg_atomic_exchange_u64.Nathan Bossart
Presently, all platforms implement atomic exchanges by performing an atomic compare-and-swap in a loop until it succeeds. This can be especially expensive when there is contention on the atomic variable. This commit optimizes atomic exchanges on many platforms by using compiler intrinsics, which should compile into something much less expensive than a compare-and-swap loop. Since these intrinsics have been available for some time, the inline assembly implementations are omitted. Suggested-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13
2023-12-18Provide vectored variants of smgrread() and smgrwrite().Thomas Munro
smgrreadv() and smgrwritev() and their md.c implementations call FileReadV() and FileWriteV(). A range of disk blocks beginning at 'blocknum' and extending for 'nblocks' can be scattered to or gathered from multiple buffers with a single system call. The traditional smgrread() and smgrwrite() functions are implemented in terms of the new functions. Later commits will introduce calls with nblocks > 1, but the following behavioral changes can be seen already: * After a short transfer we'll now retry until we eventually read 0 bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we would infer the reason. Retrying is consistent with xlog.c's treatment of large WAL writes, and arguably also xlog.c and fd.c's treatment of EINTR. Arbitrary short returns for larger transfers have been observed on several OSes, and might in theory also happen for transient reasons with our own pg_p*v() fallback code. * After unexpected EOF or -1, the error thrown now talks about a range even for the single block case, eg "blocks 42..42". Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-16Refactor pgstat_prepare_io_time() with an input argument instead of a GUCMichael Paquier
Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
2023-12-16Remove useless LIMIT_OPTION_DEFAULT value from LimitOptionAlvaro Herrera
During the development that led to commit 357889eb17bb, for a time we had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely removed later on, before commit. Complete the removal now. Author: Zhang Mingli <avamingli@gmail.com> Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
2023-12-16Provide multi-block smgrprefetch().Thomas Munro
Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a single block at a time. Add an nblocks argument so that we can do the same for a range of blocks. This usually produces a single system call, but might need to loop if it crosses a segment boundary. Initially it is only called with nblocks == 1, but proposed patches will make wider calls. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version) Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-15Fix bugs in manipulation of large objects.Tom Lane
In v16 and up (since commit afbfc0298), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
2023-12-12Provide vectored variants of FileRead() and FileWrite().Thomas Munro
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for fd.c's virtual file descriptors. The simple FileRead() and FileWrite() functions are now implemented in terms of the vectored functions, to avoid code duplication, and they are converted back to the corresponding simple system calls further down (commit 15c9ac36). Later work will make more interesting multi-iovec calls. The traditional behavior of reporting a "fake" ENOSPC error is simplified. It's now always set for non-failing writes, for the benefit of callers that expect to log a meaningful "%m" if they determine that the write was short. (Perhaps we should consider getting rid of that expectation one day.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12Provide helper for retrying partial vectored I/O.Thomas Munro
compute_remaining_iovec() is a re-usable routine for retrying after pg_readv() or pg_writev() reports a short transfer. This will gain new users in a later commit, but can already replace the open-coded equivalent code in the existing pg_pwritev_with_retry() function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12Define unconstify() and unvolatize() for C++.Thomas Munro
These two macros wouldn't work if used in an inline function definition in a header seen by g++, because __builtin_types_compatible_p is only available in C. Redirect to standard C++ const_cast (which also adds/removes volatile despite its name). Per cpluspluscheck failure in a development branch. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com
2023-12-11Simplify productions for FORMAT JSON [ ENCODING name ]Alvaro Herrera
This removes the production json_encoding_clause_opt, instead merging it into json_format_clause. Also remove the auxiliary makeJsonEncoding() function. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
2023-12-11Remove trace_recovery_messagesMichael Paquier
This GUC was intended as a debugging help in the 9.0 area when hot standby and streaming replication were being developped, able to offer more information at LOG level rather than DEBUGn. There are more tools available these days that are able to offer rather equivalent information, like pg_waldump introduced in 9.3. It is not obvious how this facility is useful these days, so let's remove it. Author: Bharath Rupireddy Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
2023-12-10Remove some unnecessary includes of "access/xlog_internal.h"Peter Eisentraut
There were a few places where access/xlog_internal.h was apparently included unnecessarily. In some of those places, a more specific header file (that somehow came in via access/xlog_internal.h) can be used instead. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/a56a6eec-eb14-471b-9570-3cac23603964%40eisentraut.org
2023-12-08Cache opaque handle for GUC option to avoid repeasted lookups.Jeff Davis
When setting GUCs from proconfig, performance is important, and hash lookups in the GUC table are significant. Per suggestion from Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com Reviewed-by: John Naylor
2023-12-08Optimize nbtree backward scan boundary cases.Peter Geoghegan
Teach _bt_binsrch (and related helper routines like _bt_search and _bt_compare) about the initial positioning requirements of backward scans. Routines like _bt_binsrch already know all about "nextkey" searches, so it seems natural to teach them about "goback"/backward searches, too. These concepts are closely related, and are much easier to understand when discussed together. Now that certain implementation details are hidden from _bt_first, it's straightforward to add a new optimization: backward scans using the < strategy now avoid extra leaf page accesses in certain "boundary cases". Consider the following example, which uses the tenk1 table (and its tenk1_hundred index) from the standard regression tests: SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1; Before this commit, nbtree would scan two leaf pages, even though it was only really necessary to scan one leaf page. We'll now descend straight to the leaf page containing a (12, -inf) high key instead. The scan will locate matching non-pivot tuples with "hundred" values starting from the value 11. The scan won't waste a page access on the right sibling leaf page, which cannot possibly contain any matching tuples. You can think of the optimization added by this commit as disabling an optimization (the _bt_compare "!pivotsearch" behavior that was added to Postgres 12 in commit dd299df8) for a small subset of cases where it was always counterproductive. Equivalently, you can think of the new optimization as extending the "pivotsearch" behavior that page deletion by VACUUM has long required (since the aforementioned Postgres 12 commit went in) to other, similar cases. Obviously, this isn't strictly necessary for these new cases (unlike VACUUM, _bt_first is prepared to move the scan to the left once on the leaf level), but the underlying principle is the same. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
2023-12-08Allow parallel CREATE INDEX for BRIN indexesTomas Vondra
Allow using multiple worker processes to build BRIN index, which until now was supported only for BTREE indexes. For large tables this often results in significant speedup when the build is CPU-bound. The work is split in a simple way - each worker builds BRIN summaries on a subset of the table, determined by the regular parallel scan used to read the data, and feeds them into a shared tuplesort which sorts them by blkno (start of the range). The leader then reads this sorted stream of ranges, merges duplicates (which may happen if the parallel scan does not align with BRIN pages_per_range), and adds the resulting ranges into the index. The number of duplicate results produced by workers (requiring merging in the leader process) should be fairly small, thanks to how parallel scans assign chunks to workers. The likelihood of duplicate results may increase for higher pages_per_range values, but then there are fewer page ranges in total. In any case, we expect the merging to be much cheaper than summarization, so this should be a win. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for BRIN indexes (e.g. uniqueness checks). This also introduces a new index AM flag amcanbuildparallel, determining whether to attempt to start parallel workers for the index build. Original patch by me, with reviews and substantial reworks by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
2023-12-08Rename ShmemVariableCache to TransamVariablesHeikki Linnakangas
The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-08Initialize ShmemVariableCache like other shmem areasHeikki Linnakangas
For sake of consistency. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-07Shrink Unicode category table.Jeff Davis
Missing entries can implicitly be considered "unassigned". Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com
2023-12-07Verify that attribute counts match in ExecCopySlotDavid Rowley
tts_virtual_copyslot() contained an Assert that checked that the srcslot contained <= attributes than the dstslot. This seems to be backwards as if the srcslot contained fewer attributes then the dstslot could be left with stale Datum values from the previously stored tuple. It might make more sense to allow the source to contain additional attributes and only copy the leading ones that also exist in the destination, however, that's not what we're doing here. Here we just remove the Assert from tts_virtual_copyslot() and add an Assert to ExecCopySlot() to verify the attribute counts match. There does not seem to be any places where the destination contains fewer attributes, so instead of going to the trouble of making the code properly handle this, let's just ensure the attribute counts match. If this Assert fails then that will indicate that we do have cases that require us to handle the srcslot with more attributes than the dstslot. It seems better to only write this code if there's a genuine requirement for it rather than write it now only to leave it untested. Thanks to Andres Freund for helping with the analysis of this. Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com
2023-12-07Fix issues in binary_upgrade_logical_slot_has_caught_up().Amit Kapila
The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up() as a non-strict function to allow providing a better error message to callers in case the passed slot_name is NULL. On further discussion, it seems that it is not helpful to have a different error message for NULL input in this function, so this patch marks the function as strict. This patch also removes the explicit permission check to use replication slots as this function is invoked only by superusers and instead adds an Assert. Reported-by: Masahiko Sawada Author: Hayato Kuroda Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
2023-12-04Remove now-unnecessary Autovacuum[Launcher|Worker]IAm functionsHeikki Linnakangas
After commit fd5e8b440d, InitProcess() is called later in the EXEC_BACKEND startup sequence, so it's enough to set the am_autovacuum_[launcher|worker] variables at the same place as in the !EXEC_BACKEND case.
2023-12-04Remove unnecessary include of <math.h>Peter Eisentraut
This was probably never necessary. (The header used to use random(), but that shouldn't require <math.h> either. In any case, that's gone, too.) Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04Remove unnecessary include of <sys/socket.h>Peter Eisentraut
This was put here as part of a mechanical replacement of the old "getaddrinfo.h" with <netdb.h> plus <sys/socket.h> (commit 5579388d2d). But here, we only need netdb.h (for NI_MAXHOST), not sys/socket.h. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04Remove unnecessary includes of <signal.h>Peter Eisentraut
These were once needed for sig_atomic_t, but that no longer appears in these headers, so the include is not needed. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04Add support for REINDEX in event triggersMichael Paquier
This commit adds support for REINDEX in event triggers, making this command react for the events ddl_command_start and ddl_command_end. The indexes rebuilt are collected with the ReindexStmt emitted by the caller, for the concurrent and non-concurrent paths. Thanks to that, it is possible to know a full list of the indexes that a single REINDEX command has worked on. Author: Garrett Thornburg, Jian He Reviewed-by: Jim Jones, Michael Paquier Discussion: https://postgr.es/m/CAEEqfk5bm32G7sbhzHbES9WejD8O8DCEOaLkxoBP7HNWxjPpvg@mail.gmail.com
2023-12-03Pass BackgroundWorker entry in the parameter file in EXEC_BACKEND modeHeikki Linnakangas
The BackgroundWorker struct is now passed the same way as the Port struct. Seems more consistent. This makes it possible to move InitProcess later in SubPostmasterMain (in next commit), as we no longer need to access shared memory to read background worker entry. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-12-03Refactor CreateSharedMemoryAndSemaphoresHeikki Linnakangas
For clarity, have separate functions for *creating* the shared memory and semaphores at postmaster or single-user backend startup, and for *attaching* to existing shared memory structures in EXEC_BACKEND case. CreateSharedMemoryAndSemaphores() is now called only at postmaster startup, and a new AttachSharedMemoryStructs() function is called at backend startup in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-11-29Optimize pg_readv/pg_pwritev single vector case.Thomas Munro
For the trivial case of iovcnt == 1, kernels are measurably slower at dealing with the more complex arguments of preadv/pwritev than the equivalent plain old pread/pwrite. The overheads are worth it for iovcnt > 1, but for 1 let's just redirect to the cheaper calls. While we could leave it to callers to worry about that, we already have to have our own pg_ wrappers for portability reasons so it seems reasonable to centralize this knowledge there (thanks to Heikki for this suggestion). Try to avoid function call overheads by making them inlinable, which might also allow the compiler to avoid the branch in some cases. For systems that don't have preadv and pwritev (currently: Windows and [closed] Solaris), we might as well pull the replacement functions up into the static inline functions too. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-11-29Use larger segment file names for pg_notifyAlexander Korotkov
This avoids the wraparound in async.c and removes the corresponding code complexity. The maximum amount of allocated SLRU pages for NOTIFY / LISTEN queue is now determined by the max_notify_queue_pages GUC. The default value is 1048576. It allows to consume up to 8 GB of disk space which is exactly the limit we had previously. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
2023-11-29Index SLRUs by 64-bit integers rather than by 32-bit integersAlexander Korotkov
We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Switching to an indexing system for SLRUs that does not wrap around should allow us to get rid of a whole bunch of problems and improve the overall reliability of the system. This particular patch however only changes the indexing and doesn't address the wraparound per se. This is going to be done in the following patches. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
2023-11-28Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.Tom Lane
We should have done it this way all along, but we accidentally got away with using the wrong BIO field up until OpenSSL 3.2. There, the library's BIO routines that we rely on use the "data" field for their own purposes, and our conflicting use causes assorted weird behaviors up to and including core dumps when SSL connections are attempted. Switch to using the approved field for the purpose, i.e. app_data. While at it, remove our configure probes for BIO_get_data as well as the fallback implementation. BIO_{get,set}_app_data have been there since long before any OpenSSL version that we still support, even in the back branches. Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor change in an error message spelling that evidently came in with 3.2. Tristan Partin and Bo Andreson. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
2023-11-28Fix assertions with RI triggers in heap_update and heap_delete.Heikki Linnakangas
If the tuple being updated is not visible to the crosscheck snapshot, we return TM_Updated but the assertions would not hold in that case. Move them to before the cross-check. Fixes bug #17893. Backpatch to all supported versions. Author: Alexander Lakhin Backpatch-through: 12 Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
2023-11-28Fix comment in tableam.h about GetHeapamTableAmRoutine()Michael Paquier
This routine is located in heapam_handler.c, not tableamapi.c. Issue noted while hacking the area for a different patch. Reviewed-by: Richard Guo Discussion: https://postgr.es/m/ZWQuHltp2KS_0Cct@paquier.xyz
2023-11-27Retire a few backwards compatibility macros.Nathan Bossart
As of commits dd04e958c8 and 1833f1a1c3, tuplestore_donestoring(), SPI_push(), SPI_pop(), SPI_push_conditional(), SPI_pop_conditional(), and SPI_restore_connection() are no-op macros provided for backwards compatibility. This commit removes these macros, so any uses in third-party code will need to be removed, too. Since these macros have been no-ops for a while, such adjustments won't produce any behavior changes for all currently-supported versions of PostgreSQL. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVeO58JM5tK2Qa8QC-%3DkC8sdkJOTd4BFU%3DK8zs4gGYpjQ%40mail.gmail.com
2023-11-27Display length and bounds histograms in pg_statsAlexander Korotkov
Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these slot kinds were introduced in 918eee0c49. This commit adds the missing fields to pg_stats. Catversion is bumped. Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru Author: Egor Rogov, Soumyadeep Chakraborty Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
2023-11-27Update comments for pg_statistic catalog tableAlexander Korotkov
Make a reminder that pg_stats view needs to be modified whenever a new slot kind is added. To prevent situations like 918eee0c49 when pg_stats was forgotten to be updated. Also, revise the comment that only non-null, non-empty rows are considered for the range length histogram. Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru Author: Egor Rogov, Soumyadeep Chakraborty Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
2023-11-25Reuse BrinDesc and BrinRevmap in brininsertTomas Vondra
The brininsert code used to initialize (and destroy) BrinDesc and BrinRevmap for each tuple, which is not free. This patch initializes these structures only once, and reuses them for all inserts in the same command. The data is passed through indexInfo->ii_AmCache. This also introduces an optional AM callback "aminsertcleanup" that allows performing custom cleanup in case simply pfree-ing ii_AmCache is not sufficient (which is the case when the cache contains TupleDesc, Buffers, and so on). Author: Soumyadeep Chakraborty Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
2023-11-23Use ResourceOwner to track WaitEventSets.Heikki Linnakangas
A WaitEventSet holds file descriptors or event handles (on Windows). If FreeWaitEventSet is not called, those fds or handles are leaked. Use ResourceOwners to track WaitEventSets, to clean those up automatically on error. This was a live bug in async Append nodes, if a FDW's ForeignAsyncRequest function failed. (In back branches, I will apply a more localized fix for that based on PG_TRY-PG_FINALLY.) The added test doesn't check for leaking resources, so it passed even before this commit. But at least it covers the code path. In the passing, fix misleading comment on what the 'nevents' argument to WaitEventSetWait means. Report by Alexander Lakhin, analysis and suggestion for the fix by Tom Lane. Fixes bug #17828. Reviewed-by: Alexander Lakhin, Thomas Munro Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
2023-11-21Avoid overflow in fe_utils' printTable()Alvaro Herrera
The original code would miscalculate the total number of cells when the table to print has more than ~4 billion cells, leading to an unnecessary error. Repair by changing some computations to be 64-bits wide. Add some necessary overflow checks. Author: Hongxu Ma <interma@outlook.com> Discussion: https://postgr.es/m/TYBP286MB0351B057B101C90D7C1239E6B4E2A@TYBP286MB0351.JPNP286.PROD.OUTLOOK.COM
2023-11-17simplehash: preserve consistency in case of OOM.Jeff Davis
Compute size first, then allocate, then update the structure. Previously, an out-of-memory when growing could leave the hashtable in an inconsistent state. Discussion: https://postgr.es/m/20231117201334.eyb542qr5yk4gilq@awork3.anarazel.de Reviewed-by: Andres Freund Reviewed-by: Gurjeet Singh
2023-11-17Change logtape/tuplestore code to use int64 for block numbersMichael Paquier
The code previously relied on "long" as type to track block numbers, which would be 4 bytes in all Windows builds or any 32-bit builds. This limited the code to be able to handle up to 16TB of data with the default block size of 8kB, like during a CLUSTER. This code now relies on a more portable int64, which should be more than enough for at least the next 20 years to come. This issue has been reported back in 2017, but nothing was done about it back then, so here we go now. Reported-by: Peter Geoghegan Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com