Age | Commit message (Collapse) | Author |
|
Instead of talking about setting latches, which is a pretty low-level
mechanism, emphasize that they wake up other processes.
This is in preparation for replacing Latches with a new abstraction.
That's still work in progress, but this seems a little tidier anyway,
so let's get this refactoring out of the way already.
Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
|
|
This is in preparation for replacing Latches with a new abstraction.
That's still work in progress, but this seems a little tidier anyway,
so let's get this refactoring out of the way already.
Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
|
|
A few comments contained duplicate "the" in sentences, fix by removing
one occurrence.
Author: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2aEEiPwGJmPdzBxROVvs8n75yCjKz4K1f1B2TdWpzxTA@mail.gmail.com
|
|
as determined by IWYU
These are mostly issues that are new since commit dbbca2cf299.
Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
|
|
The 'latch' argument is ignored if WL_LATCH_SET is not given. Clarify
these calls by not pointlessly passing MyLatch.
Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi
|
|
Refactoring in the interest of code consistency, a follow-up to 2e068db56e31.
The argument against inserting a special enum value at the end of the enum
definition is that a switch statement might generate a compiler warning unless
it has a default clause.
Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut
Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com
|
|
Checkpoints can be skipped when the server is idle. The existing num_timed and
num_requested counters in pg_stat_checkpointer track both completed and
skipped checkpoints, but there was no way to count only the completed ones.
This commit introduces the num_done counter, which tracks only completed
checkpoints, making it easier to see how many were actually performed.
Bump catalog version.
Author: Anton A. Melnikov
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com
|
|
Replace the fixed-size array of fast-path locks with arrays, sized on
startup based on max_locks_per_transaction. This allows using fast-path
locking for workloads that need more locks.
The fast-path locking introduced in 9.2 allowed each backend to acquire
a small number (16) of weak relation locks cheaply. If a backend needs
to hold more locks, it has to insert them into the shared lock table.
This is considerably more expensive, and may be subject to contention
(especially on many-core systems).
The limit of 16 fast-path locks was always rather low, because we have
to lock all relations - not just tables, but also indexes, views, etc.
For planning we need to lock all relations that might be used in the
plan, not just those that actually get used in the final plan. So even
with rather simple queries and schemas, we often need significantly more
than 16 locks.
As partitioning gets used more widely, and the number of partitions
increases, this limit is trivial to hit. Complex queries may easily use
hundreds or even thousands of locks. For workloads doing a lot of I/O
this is not noticeable, but for workloads accessing only data in RAM,
the access to the shared lock table may be a serious issue.
This commit removes the hard-coded limit of the number of fast-path
locks. Instead, the size of the fast-path arrays is calculated at
startup, and can be set much higher than the original 16-lock limit.
The overall fast-path locking protocol remains unchanged.
The variable-sized fast-path arrays can no longer be part of PGPROC, but
are allocated as a separate chunk of shared memory and then references
from the PGPROC entries.
The fast-path slots are organized as a 16-way set associative cache. You
can imagine it as a hash table of 16-slot "groups". Each relation is
mapped to exactly one group using hash(relid), and the group is then
processed using linear search, just like the original fast-path cache.
With only 16 entries this is cheap, with good locality.
Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table gets almost full. The usual
remedy is to grow the table, but we can't do that here easily. The
access would also be more random, with worse locality.
The fast-path arrays are sized using the max_locks_per_transaction GUC.
We try to have enough capacity for the number of locks specified in the
GUC, using the traditional 2^n formula, with an upper limit of 1024 lock
groups (i.e. 16k locks). The default value of max_locks_per_transaction
is 64, which means those instances will have 64 fast-path slots.
The main purpose of the max_locks_per_transaction GUC is to size the
shared lock table. It is often set to the "average" number of locks
needed by backends, with some backends using significantly more locks.
This should not be a major issue, however. Some backens may have to
insert locks into the shared lock table, but there can't be too many of
them, limiting the contention.
The only solution is to increase the GUC, even if the shared lock table
already has sufficient capacity. That is not free, especially in terms
of memory usage (the shared lock table entries are fairly large). It
should only happen on machines with plenty of memory, though.
In the future we may consider a separate GUC for the number of fast-path
slots, but let's try without one first.
Reviewed-by: Robert Haas, Jakub Wartak
Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com
|
|
This is a continuation of 17974ec25946. More quotes are applied to
GUC names in error messages and hints, taking care of what seems to be
all the remaining holes currently in the tree for the GUCs.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
|
|
Author: Alexander Lakhin
Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com
|
|
|
|
The log message on backend crash used wrong variable, which could be
uninitialized. Introduced in commit 28a520c0b7.
Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/451b0797-83b8-cdbc-727f-8d7a7b0e3bca@gmail.com
|
|
Some newer code was applying this inconsistently.
|
|
Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.
In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
|
|
Since the introduction of TID store, vacuum uses far less memory in
the common case than in versions 16 and earlier. Invoking multiple
rounds of index vacuuming in turn requires a much larger table. It'd
be a good idea anyway to cover this case in regression testing, and a
lower limit is less painful for slow buildfarm animals. The reason to
do it now is to re-enable coverage of the bugfix in commit 83c39a1f7f.
For consistency, give autovacuum_work_mem the same treatment.
Suggested by Andres Freund
Tested by Melanie Plageman
Backpatch to v17, where TID store was introduced
Discussion: https://postgr.es/m/20240516205458.ohvlzis5b5tvejru@awork3.anarazel.de
Discussion: https://postgr.es/m/20240722164745.fvaoh6g6zprisqgp%40awork3.anarazel.de
|
|
All child processes except the syslogger are killed on a restart. The
archiver might be already running though, if it was started during
recovery.
The split in the comments between "other special children" and the
first group of "background tasks" seemed really arbitrary, so I just
merged them all into one group.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
|
|
Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.
Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
had already cleared the *PID global variable, so the code in
HandleChildCrash() to do that was unused.
On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.
If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
|
|
This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
|
|
Make assign_backendlist_entry() responsible just for allocating the
Backend struct. Linking it to the RegisteredBgWorker is the caller's
responsibility now. Seems more clear that way.
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
|
|
Oversight in e2562667, which stopped using SpinlockSemaArray but forgot
to remove it from the array.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/310f4005-91d7-42b2-ac70-92624260dd28%40iki.fi
|
|
A later change will require atomic support, so it wouldn't make sense
for a hypothetical new system not to be able to implement spinlocks.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch)
Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch)
Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
|
|
Commit aafc05de1b removed the calls to detach from shared memory from
syslogger startup. That was not intentional, so put them back.
Author: Rui Zhao
Reviewed-by: Aleksander Alekseev
Backpatch-through: 17
Discussion: https://www.postgresql.org/message-id/11505016-8cf3-4691-b996-7faed99b7877.xiyuan.zr@alibaba-inc.com
|
|
Move responsibility of generating the cancel key to the backend
process. The cancel key is now generated after forking, and the
backend advertises it in the ProcSignal array. When a cancel request
arrives, the backend handling it scans the ProcSignal array to find
the target pid and cancel key. This is similar to how this previously
worked in the EXEC_BACKEND case with the ShmemBackendArray, just
reusing the ProcSignal array.
One notable change is that we no longer generate cancellation keys for
non-backend processes. We generated them before just to prevent a
malicious user from canceling them; the keys for non-backend processes
were never actually given to anyone. There is now an explicit flag
indicating whether a process has a valid key or not.
I wrote this originally in preparation for supporting longer cancel
keys, but it's a nice cleanup on its own.
Reviewed-by: Jelte Fennema-Nio
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi
|
|
When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.
To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.
In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.
Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.
Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
|
|
|
|
The old code believed that it was not possible to switch timelines
without first replaying all of the WAL from the old timeline, but
that turns out to be false, as demonstrated by an example from Fujii
Masao. As a result, it assumed that summarization would always
continue from the LSN where summarization previously ended. But in
fact, when a timeline switch occurs without replaying all the WAL
from the previous timeline, we can need to back up to an earlier
LSN. Adjust accordingly.
Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
|
|
Commit 86db52a506 changed the locking of injection points to use only
atomic ops and spinlocks, to make it possible to define injection
points in processes that don't have a PGPROC entry (yet). However, it
didn't work in EXEC_BACKEND mode, because the pointer to shared memory
area was not initialized until the process "attaches" to all the
shared memory structs. To fix, pass the pointer to the child process
along with other global variables that need to be set up early.
Backpatch-through: 17
|
|
This fixes warnings from -Wmissing-variable-declarations (not yet part
of the standard warning options) under EXEC_BACKEND. The
NON_EXEC_STATIC variables need a suitable declaration in a header file
under EXEC_BACKEND.
Also fix the inconsistent application of the volatile qualifier for
PMSignalState, which was revealed by this change.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
|
|
After calling ConditionVariableSleep() or ConditionVariableTimedSleep()
one or more times, code is supposed to call ConditionVariableCancelSleep()
to remove itself from the waitlist. This code neglected to do so.
As far as I know, that had no observable consequences, but let's make
the code correct.
Discussion: http://postgr.es/m/CA+TgmoYW8eR+KN6zhVH0sin7QH6AvENqw_bkN-bB4yLYKAnsew@mail.gmail.com
|
|
To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.
This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.
A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.
Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.
Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
|
|
This commit provides testig coverage for ccd38024bc3c, checking that a
role granted pg_signal_autovacuum_worker is able to stop a vacuum
worker.
An injection point with a wait is placed at the beginning of autovacuum
worker startup to make sure that a worker is still alive when sending
and processing the signal sent.
Author: Anthony Leung, Michael Paquier, Kirill Reshke
Reviewed-by: Andrey Borodin, Nathan Bossart
Discussion: https://postgr.es/m/CALdSSPiQPuuQpOkF7x0g2QkA5eE-3xXt7hiJFvShV1bHKDvf8w@mail.gmail.com
|
|
memcpy(NULL, src, 0) is forbidden by POSIX, even though every
production version of libc allows it. Let's be tidy.
Per report from Thomas Munro, running UBSan with EXEC_BACKEND.
Backpatch to v17, where this code was added.
Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2Be-dV7YWBzfBZXsgovgRuX5VmvmOT%2Bv0aXiZJ-EKbXcw@mail.gmail.com
|
|
After several refactoring iterations, auxiliary processes are no
longer initialized from the bootstrapper. Using the InitProcessing
mode for initializing auxiliary processes is more appropriate. Since
the global variable Mode is initialized to InitProcessing, we can just
remove the redundant calls of SetProcessingMode(InitProcessing).
Author: Xing Guo <higuoxing@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com
|
|
For clarity, we've been slowly moving functions that are not called
from the postmaster process out of postmaster.c.
Author: Xing Guo <higuoxing@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com
|
|
We have in launch_backend.c:
/*
* The following need to be available to the save/restore_backend_variables
* functions. They are marked NON_EXEC_STATIC in their home modules.
*/
extern slock_t *ShmemLock;
extern slock_t *ProcStructLock;
extern PGPROC *AuxiliaryProcs;
extern PMSignalData *PMSignalState;
extern pg_time_t first_syslogger_file_time;
extern struct bkend *ShmemBackendArray;
extern bool redirection_done;
That comment is not completely true: ShmemLock, ShmemBackendArray, and
redirection_done are not in fact NON_EXEC_STATIC. ShmemLock once was,
but was then needed elsewhere. ShmemBackendArray was static inside
postmaster.c before launch_backend.c was created. redirection_done
was never static.
This patch moves the declaration of ShmemLock and redirection_done to
a header file.
ShmemBackendArray gets a NON_EXEC_STATIC. This doesn't make a
difference, since it only exists if EXEC_BACKEND anyway, but it makes
it consistent.
After that, the comment is now correct.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
|
|
These probably should have been static all along, it was only
forgotten out of sloppiness.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
|
|
Commit 065583cf460f980a182498941ac52810f709a897 should have
included these changes.
|
|
Before this commit, when the WAL summarizer started up or recovered
from an error, it would resume summarization from wherever it left
off. That was OK normally, but wrong if summarize_wal=off had been
turned off temporary, allowing some WAL to be removed, and then turned
back on again. In such cases, the WAL summarizer would simply hang
forever. This commit changes the reinitialization sequence for WAL
summarizer to rederive the starting position in the way we were
already doing at initial startup, fixing the problem.
Per report from Israel Barth Rubio. Reviewed by Tom Lane.
Discussion: http://postgr.es/m/CA+TgmoYN6x=YS+FoFOS6=nr6=qkXZFWhdiL7k0oatGwug2hcuA@mail.gmail.com
|
|
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:
TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]
To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.
2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.
More concretely, the autovacuum process is stuck here:
#0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
#2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
#3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
at ../src/backend/storage/ipc/latch.c:538
#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569
and the checkpointer is stuck here:
#0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
#3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464
To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.
Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.
Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
|
|
|
|
Fixups for 17974ec259: Some messages were missed (and some were new
since the patch was originally proposed), and there was a typo
introduced.
|
|
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf65 and
8d9978a7176, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
|
|
A few patches that were written <= 2023 and committed in 2024 still
contained 2023 copyright year. Fix that.
Discussion: https://postgr.es/m/CAApHDvr5egyW3FmHbAg-Uq2p_Aizwco1Zjs6Vbq18KqN64-hRA@mail.gmail.com
|
|
Repeating loads of inplace-updated fields tends to cause bugs like the
one from the previous commit. While there's no bug to fix in these code
sites, adopt the load-once style. This improves the chance of future
copy/paste finding the safe style.
Discussion: https://postgr.es/m/20240423003956.e7.nmisch@google.com
|
|
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
|
|
This commit reverts 9bd99f4c26 and 422041542f per review by Andres Freund.
Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
|
|
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.
Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
|
|
Reported-by: Thomas Munro
Reported-by: Pavel Borisov
Discussion: https://postgr.es/m/CA%2BhUKGLZzLR50RBvuqOO3MZ%3DF54ETz-rTp1PDX9uDGP_GqyYqA%40mail.gmail.com
|
|
Let table AM define custom reloptions for its tables. This allows specifying
AM-specific parameters by the WITH clause when creating a table.
The reloptions, which could be used outside of table AM, are now extracted
into the CommonRdOptions data structure. These options could be by decision
of table AM directly specified by a user or calculated in some way.
The new test module test_tam_options evaluates the ability to set up custom
reloptions and calculate fields of CommonRdOptions on their base.
The code may use some parts from prior work by Hao Wu.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn
Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis
|
|
Presently, the archiver process restarts when an archive callback
ERRORs. To avoid this, archive module authors can use sigsetjmp(),
manage a memory context, etc., but that requires a lot of extra
code that will likely look roughly the same between modules. This
commit adds basic archive callback ERROR handling to pgarch.c so
that module authors won't ordinarily need to worry about this.
While this built-in handler attempts to clean up anything that an
archive module could conceivably have left behind, it is possible
that some modules are doing unexpected things that require
additional cleanup. Module authors should be sure to do any extra
required cleanup in a PG_CATCH block within the archiving callback.
The archiving callback is now called in a short-lived memory
context that the archiver process resets between invocations. If a
module requires longer-lived storage, it must maintain its own
memory context.
Thanks to these changes, the basic_archive module can be greatly
simplified.
Suggested-by: Andres Freund
Reviewed-by: Andres Freund, Yong Li
Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13
|