Age | Commit message (Collapse) | Author |
|
The function has been reading global variable forceSyncCommit, mirroring
the intent of the caller that passed forceSync=forceSyncCommit. The
other caller, RecordTransactionCommitPrepared(), passed false. Since
COMMIT PREPARED can't share a transaction with any command, it certainly
doesn't share a transaction with a command that sets forceSyncCommit.
Reviewed by Michael Paquier.
Discussion: https://postgr.es/m/20200617032615.GC2916904@rfd.leadboat.com
|
|
Commit 72d422a5227ef6f76f412486a395aba9f53bf3f0 made xlog.c call
sendTablespace() with the 'sizeonly' argument set to true, which
required basebackup.c to export sendTablespace(). However, that's
kind of ugly, so instead defer the call to sendTablespace() until
basebackup.c regains control. That way, it can still be a static
function.
Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi.
Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
|
|
Add a separate enum for use in the locking APIs, which were the only
user.
Discussion: https://www.postgresql.org/message-id/flat/a6f91ead-0ce4-2a34-062b-7ab9813ea308%402ndquadrant.com
|
|
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.
Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().
Back-patch to all supported releases.
Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
|
|
Since database connections can be used with WAL senders in 9.4, it is
possible to use physical replication. This commit fixes a crash when
starting physical replication with a WAL sender using a database
connection, caused by the refactoring done in 850196b.
There have been discussions about forbidding the use of physical
replication in a database connection, but this is left for later,
taking care only of the crash new to 13.
While on it, add a test to check for a failure when attempting logical
replication if the WAL sender does not have a database connection. This
part is extracted from a larger patch by Kyotaro Horiguchi.
Reported-by: Vladimir Sitnikov
Author: Michael Paquier, Kyotaro Horiguchi
Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera
Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com
Backpatch-through: 13
|
|
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes. Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name. Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.
Also a bit more wordsmithing in the wait event documentation tables.
Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
|
|
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
|
|
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
|
|
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names. It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices. Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.
Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.
(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)
Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
|
|
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.
Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.
In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption. (That partly
reverts fe702a7b3.)
This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.
Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
|
|
* Have both physical and logical walsender share a 'xlogreader' state
struct for tracking state. This replaces the existing globals sendSeg
and sendCxt.
* Change WALRead not to receive XLogReaderState->seg and ->segcxt as
separate arguments anymore; just use the ones from 'state'. This is
made possible by the above change.
* have the XLogReader segment_open contract require the callbacks to
install the file descriptor in the state struct themselves instead of
returning it. xlogreader was already ignoring any possible failed
return from the callbacks, relying solely on them never returning.
(This point is not altogether excellent, as it means the callbacks
have to know more of XLogReaderState; but to really improve on that
we would have to pass back error info from the callbacks to
xlogreader. And the complexity would not be saved but instead just
transferred to the callers of WALRead, which would have to learn how
to throw errors from the open_segment callback in addition of, as
currently, from pg_pread.)
* segment_open no longer receives the 'segcxt' as a separate argument,
since it's part of the XLogReaderState argument.
Per comments from Kyotaro Horiguchi.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
|
|
The existing callers of XLogReadDetermineTimeline() performing recovery
need to check a replay LSN position when determining on which timeline
to read a WAL page. A portion of the comment describing this function
said exactly that, while referring to a routine for fetching a write
LSN, something not available in recovery.
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com
|
|
Code review for 0dc8ead46363, prompted by a bug closed by 91c40548d5f7.
XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open). This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.
Document the whole arrangement more clearly.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
|
|
TimelineHistoryRead and TimelineHistoryWrite wait events are reported
during waiting for a read and write of a timeline history file, respectively.
However, previously, TimelineHistoryRead wait event was not reported
while readTimeLineHistory() was reading a timeline history file. Also
TimelineHistoryWrite was not reported while writeTimeLineHistory() was
writing one line with the details of the timeline split, at the end.
This commit fixes these issues.
Back-patch to v10 where wait events for a timeline history file was added.
Author: Masahiro Ikeda
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/d11b0c910b63684424e06772eb844ab5@oss.nttdata.com
|
|
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain. The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field. Change the format to make WAL usage
statistics consistent with Buffer usage statistics.
This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.
Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
|
|
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
|
|
from 303640199d0
|
|
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode. However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.
This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not. Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.
A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.
Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
|
|
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation. However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening. And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.
If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory. In debug builds that'd fairly
reliably cause an assertion failure. In production we might often
get away with it, but with some bad luck it could cause a core dump.
Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away. (They'd still get dropped by the next session that
uses that temp schema.)
To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.
Per bug #16378 from Alexander Lakhin. This has been wrong for a
very long time, so back-patch to all supported branches.
Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
|
|
Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila and Sawada Masahiko
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
|
|
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
|
|
GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location. Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".
Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value. This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
|
|
0f5ca02f53 introduces 3 new keywords. It appears to be too much for relatively
small feature. Given now we past feature freeze, it's already late for
discussion of the new syntax. So, revert.
Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
|
|
The goal of separating hotly accessed per-backend data from PGPROC
into PGXACT is to make accesses fast (GetSnapshotData() in
particular). But delayChkpt is not actually accessed frequently; only
when starting a checkpoint. As it is frequently modified (multiple
times in the course of a single transaction), storing it in the same
cacheline as hotly accessed data unnecessarily dirties a contended
cacheline.
Therefore move delayChkpt to PGPROC.
This is part of a larger series of patches intending to improve
GetSnapshotData() scalability. It is committed and pushed separately,
as it is independently beneficial (small but measurable win, limited
by the other frequent modifications of PGXACT).
Author: Andres Freund
Reviewed-By: Robert Haas, Thomas Munro, David Rowley
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
|
|
SLRU page hits were tracked only in SimpleLruReadPage, but that's not
enough because we may hit the page in SimpleLruReadPage_ReadOnly in
which case we don't call SimpleLruReadPage at all.
Reported-by: Kuntal Ghosh
Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
|
|
Before the fix every 2PC commit/abort leaked a file descriptor. As the
files are opened using BasicOpenFile(), that quickly leads to the
backend running out of file descriptors.
Once enough 2PC abort/commit have caused enough FDs to leak, any IO
in the backend will fail with "Too many open files", as
BasicOpenFilePerm() will have triggered all open files known to fd.c
to be closed.
The leak causing the problem at hand is a consequence of 0dc8ead46,
but is only exascerbated by it. Previously most XLogPageReadCB
callbacks used static variables to cache one open file, but after the
commit the cache is private to each XLogReader instance. There never
was infrastructure to close FDs at the time of XLogReaderFree, but the
way XLogReader was used limited the leak to one FD.
This commit just closes the during XLogReaderFree() if the FD is
stored in XLogReaderState.seg.ws_segno. This may not be the way to
solve this medium/long term, but at least unbreaks 2PC.
Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
|
|
Replication slots are useful to retain data that may be needed by a
replication system. But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space. This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size. Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.
Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
|
|
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.
WAIT FOR LSN lsn [ TIMEOUT timeout ]
New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.
New shared memory struct is used to track awaited lsn per backend. Recovery
process wakes up backend once required lsn is applied.
Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
|
|
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.
To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.
Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao
Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
|
|
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
|
|
This reverts commit 246f136e76ecd26844840f2b2057e2c87ec9868d.
That patch wasn't quite complete enough.
Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
|
|
This allows gathering the WAL generation statistics for each statement
execution. The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.
This helps the users who have write-intensive workload to see the impact
of I/O due to WAL. This further enables us to see approximately what
percentage of overall WAL is due to full page writes.
In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.
This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
|
|
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
|
|
There's a number of SLRU caches used to access important data like clog,
commit timestamps, multixact, asynchronous notifications, etc. Until now
we had no easy way to monitor these shared caches, compute hit ratios,
number of reads/writes etc.
This commit extends the statistics collector to track this information
for a predefined list of SLRUs, and also introduces a new system view
pg_stat_slru displaying the data.
The list of built-in SLRUs is fixed, but additional SLRUs may be defined
in extensions. Unfortunately, there's no suitable registry of SLRUs, so
this patch simply defines a fixed list of SLRUs with entries for the
built-in ones and one entry for all additional SLRUs. Extensions adding
their own SLRU are fairly rare, so this seems acceptable.
This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes
are still fixed (hard-coded in the code) and it's not entirely clear
which of the SLRUs might need a GUC to tune size. In a way, allowing us
to determine that is one of the goals of this patch.
Bump catversion as the patch introduces new functions and system view.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development
|
|
When recovery target is reached and recovery is paused because of
recovery_target_action=pause, executing pg_wal_replay_resume() causes
the standby to promote, i.e., the recovery to end. So, in this case,
the previous message "Execute pg_wal_replay_resume() to continue"
logged was confusing because pg_wal_replay_resume() doesn't cause
the recovery to continue.
This commit improves the message logged when recovery is paused,
and the proper message is output based on what (pg_wal_replay_pause
or recovery_target_action) causes recovery to be paused.
Author: Sergei Kornilov, revised by Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
|
|
The definitions of the routines defined in xlogarchive.c have been part
of xlog_internal.h which is included by several frontend tools, but all
those routines are only called by the backend. More cleanup could be
done within xlog_internal.h, but that's already a nice cut.
This will help a follow-up patch for pg_rewind where handling of
restore_command is added for frontends.
Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Álvaro Herrera, Alexander Korotkov
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
|
|
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record. The standby then checks whether its own settings are at least
as big as the ones on the primary. If not, the standby shuts down
with a fatal error.
The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.
However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction. The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources. If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.
So this patch relaxes this a bit. Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem. Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information. At that point we pause
recovery, so a hot standby remains usable. We also repeat the last
warning message once a minute so it is harder to miss or ignore.
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
|
|
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart. This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.
Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.
Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
|
|
Commit 329730827848 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP. That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.
We handle the first problem by setting the option to disabled by
default. If the user enables it, its on their head to make sure that
disk doesn't fill up.
We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value). A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.
Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
|
|
Previously if a promotion was triggered while recovery was paused,
the paused state continued. Also recovery could be paused by executing
pg_wal_replay_pause() even while a promotion was ongoing. That is,
recovery pause had higher priority over a standby promotion.
But this behavior was not desirable because most users basically wanted
the recovery to complete as soon as possible and the server to become
the master when they requested a promotion.
This commit changes recovery so that it prefers a promotion over
recovery pause. That is, if a promotion is triggered while recovery
is paused, the paused state ends and a promotion continues. Also
this commit makes recovery pause functions like pg_wal_replay_pause()
throw an error if they are executed while a promotion is ongoing.
Internally, this commit adds new internal function PromoteIsTriggered()
that returns true if a promotion is triggered. Since the name of
this function and the existing function IsPromoteTriggered() are
confusingly similar, the commit changes the name of IsPromoteTriggered()
to IsPromoteSignaled, as more appropriate name.
Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi, Sergei Kornilov
Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
|
|
restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
|
|
This commit introduces new wait events BackupWaitWalArchive and
RecoveryPause. The former is reported while waiting for the WAL files
required for the backup to be successfully archived. The latter is
reported while waiting for recovery in pause state to be resumed.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas
Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com
|
|
This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64. Back-patch to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
|
|
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
|
|
The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring
the actual arguments. Existing callers passed exactly those values, so
there's no live bug. Back-patch to v12, where the function first
appeared, because another fix needs this.
Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
|
|
This commit renames RecoveryWalAll and RecoveryWalStream wait events to
RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively,
in order to make the names and what they are more consistent. For example,
previously RecoveryWalAll was reported as a wait event while the recovery
was waiting for WAL from a stream, and which was confusing because the name
was very different from the situation where the wait actually could happen.
The names of macro variables for those wait events also are renamed
accordingly.
This commit also changes the category of RecoveryRetrieveRetryInterval to
Timeout from Activity because the wait event is reported while waiting based
on wal_retrieve_retry_interval.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi
Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
|
|
This commit fixes the comment about SharedHotStandbyActive variable.
The comment was apparently copy-and-pasted.
Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/CACZ0uYEjpqZB9wN2Rwc_RMvDybyYqdbkPuDr1NyxJg4f9yGfMw@mail.gmail.com
|
|
We don't need to manually clean up allocations in a SRF's
multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure
takes care of that (and also ensures that it will happen even if the
function never gets a final call, which simple manual cleanup cannot
do).
Hence, the code removed by this patch is a waste of code and cycles.
Worse, it gives the impression that cleaning up manually is a thing,
which can lead to more serious errors such as those fixed in
commits 085b6b667 and b4570d33a. So we should get rid of it.
These are not quite actual bugs though, so I couldn't muster the
enthusiasm to back-patch. Fix in HEAD only.
Justin Pryzby
Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
|
|
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary. The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.
Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.
BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened. So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
|
|
Previously, hard links were not used on Windows and Cygwin, but they
support them just fine in currently supported OS versions, so we can
use them there as well.
Since all supported platforms now support hard links, we can remove
the alternative code paths.
Rename durable_link_or_rename() to durable_rename_excl() to make the
purpose more clear without referencing the implementation details.
Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
|