Age | Commit message (Collapse) | Author |
|
This view shows the statistics about WAL activity. Currently it has only
two columns: wal_buffers_full and stats_reset. wal_buffers_full column
indicates the number of times WAL data was written to the disk because
WAL buffers got full. This information is useful when tuning wal_buffers.
stats_reset column indicates the time at which these statistics were
last reset.
pg_stat_wal view is also the basic infrastructure to expose other
various statistics about WAL activity later.
Bump PGSTAT_FILE_FORMAT_ID due to the change in pgstat format.
Bump catalog version.
Author: Masahiro Ikeda
Reviewed-by: Takayuki Tsunakawa, Kyotaro Horiguchi, Amit Kapila, Fujii Masao
Discussion: https://postgr.es/m/188bd3f2d2233cf97753b5ced02bb050@oss.nttdata.com
|
|
Previously, a conversion such as
to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years. Fix the off-by-one problem.
Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD. This is how
the negative-century case works, so it seems sane to do likewise.
Continue to read "year 0000" as 1 BC. Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.
Per bug #16419 from Saeed Hubaishan. Back-patch to all supported
branches.
Dar Alathar-Yemen and Tom Lane
Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
|
|
Previously we threw an error. But make_date already allowed the case,
so it is inconsistent as well as unhelpful for make_timestamp not to.
Both functions continue to reject year zero.
Code and test fixes by Peter Eisentraut, doc changes by me
Discussion: https://postgr.es/m/13c0992c-f15a-a0ca-d839-91d3efd965d9@2ndquadrant.com
|
|
The SQL standard doesn't require jsonpath .datetime() method to support the
ISO 8601 format. But our to_json[b]() functions convert timestamps to text in
the ISO 8601 format in the sake of compatibility with javascript. So, we add
support of the ISO 8601 to the jsonpath .datetime() in the sake compatibility
with to_json[b]().
The standard mode of datetime parsing currently supports just template patterns
and separators in the format string. In order to implement ISO 8601, we have to
add support of the format string double quotes to the standard parsing mode.
Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Backpatch-through: 13
|
|
bffe1bd684 has introduced jsonpath .datetime() method, but default formats
for time and timestamp contain excess space between time and timezone. This
commit removes this excess space making behavior of .datetime() method
standard-compliant.
Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru
Author: Nikita Glukhov
Backpatch-through: 13
|
|
We have a dozen or so places that need to iterate over all but the
first cell of a List. Prior to v13 this was typically written as
for_each_cell(lc, lnext(list_head(list)))
Commit 1cff1b95a changed these to
for_each_cell(lc, list, list_second_cell(list))
This patch introduces a new macro for_each_from() which expresses
the start point as a list index, allowing these to be written as
for_each_from(lc, list, 1)
This is marginally more efficient, since ForEachState.i can be
initialized directly instead of backing into it from a ListCell
address. It also seems clearer and less typo-prone.
Some of the remaining uses of for_each_cell() look like they could
profitably be changed to for_each_from(), but here I confined myself
to changing uses of list_second_cell().
Also, fix for_each_cell_setup() and for_both_cell_setup() to
const-ify their arguments; that's a simple oversight in 1cff1b95a.
Back-patch into v13, on the grounds that (1) the const-ification
is a minor bug fix, and (2) it's better for back-patching purposes
if we only have two ways to write these loops rather than three.
In HEAD, also remove list_third_cell() and list_fourth_cell(),
which were also introduced in 1cff1b95a, and are unused as of
cc99baa43. It seems unlikely that any third-party code would
have started to use them already; anyone who has can be directed
to list_nth_cell instead.
Discussion: https://postgr.es/m/CAApHDvpo1zj9KhEpU2cCRZfSM3Q6XGdhzuAS2v79PH7WJBkYVA@mail.gmail.com
|
|
SQL operations such as CURRENT_DATE, CURRENT_TIME, LOCALTIME, and
conversion of "now" in a datetime input string have to obtain the
transaction start timestamp ("now()") as a broken-down struct pg_tm.
This is a remarkably expensive conversion, and since now() does not
change intra-transaction, it doesn't really need to be done more than
once per transaction. Introducing a simple cache provides visible
speedups in queries that compute these values many times, for example
insertion of many rows that use a default value of CURRENT_DATE.
Peter Smith, with a bit of kibitzing by me
Discussion: https://postgr.es/m/CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com
|
|
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution. That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment. So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup. This has a number
of positive benefits:
* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.
* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).
* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.
* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used. This should mean less confusion for users.
* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that. For example,
we should usually estimate more executions of a qual expression
than a targetlist expression. The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.
That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
|
|
The previous coding was confused about whether head_timestamp was
intended to represent the timestamp for the newest bucket in the
mapping or the oldest timestamp for the oldest bucket in the mapping.
Decide that it's intended to be the oldest one, and repair
accordingly.
To do that, we need to do two things. First, when advancing to a
new bucket, don't categorically set head_timestamp to the new
timestamp. Do this only if we're blowing out the map completely
because a lot of time has passed since we last maintained it. If
we're replacing entries one by one, advance head_timestamp by
1 minute for each; if we're filling in unused entries, don't
advance head_timestamp at all.
Second, fix the computation of how many buckets we need to advance.
The previous formula would be correct if head_timestamp were the
timestamp for the new bucket, but we're now making all the code
agree that it's the timestamp for the oldest bucket, so adjust the
formula accordingly.
This is certainly a bug fix, but I don't feel good about
back-patching it without the introspection tools added by commit
aecf5ee2bb36c597d3c6142e367e38d67816c777, and perhaps also some
actual tests. Since back-patching the introspection tools might
not attract sufficient support and since there are no automated
tests of these fixes yet, I'm just committing this to master for
now.
Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar.
Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
|
|
This makes it possible for code outside snapmgr.c to examine the
contents of this data structure. This commit does not add any code
which actually does so; a subsequent commit will make that change.
Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar.
Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
|
|
This completes the project of making all our derived files be
pgindent-clean (or else explicitly excluded from indentation),
so that no surprises result when running pgindent in a built-out
development tree.
Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com
|
|
99% of this is docs, but also a couple of comments. No code changes.
Justin Pryzby
Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com
|
|
The standard order in PostgreSQL and other code is use strict first,
but some code was uselessly inconsistent about this.
|
|
Commit be0a6666 left behind a comment about the order of some tests that
didn't make sense without the expensive division, and in fact we might
as well change the order to one that fails more cheaply most of the time
as a micro-optimization. Also, remove the "+ 1" applied to max_bucket,
to drop an instruction and match the original behavior. Per review
from Tom Lane.
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
|
|
Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:
1. It was an integer, whereas for in-memory hash tables interesting
load factor targets are probably somewhere near the 0.75-1.0 range.
2. It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.
3. We are not aware of anyone ever having used a non-default value.
Therefore, remove support, effectively fixing it at 1.
Author: Jakub Wartak <Jakub.Wartak@tomtom.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
|
|
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
|
|
This feature has been a thorn in our sides for a long time, causing
many grammatical ambiguity problems. It doesn't seem worth the
pain to continue to support it, so remove it.
There are some follow-on improvements we can make in the grammar,
but this commit only removes the bare minimum number of productions,
plus assorted backend support code.
Note that pg_dump and psql continue to have full support, since
they may be used against older servers. However, pg_dump warns
about postfix operators. There is also a check in pg_upgrade.
Documentation-wise, I (tgl) largely removed the "left unary"
terminology in favor of saying "prefix operator", which is
a more standard and IMO less confusing term.
I included a catversion bump, although no initial catalog data
changes here, to mark the boundary at which oprkind = 'r'
stopped being valid in pg_operator.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
|
|
In the particular case of GRANTED BY, this is specified in the SQL
standard. Since in PostgreSQL, CURRENT_ROLE is equivalent to
CURRENT_USER, and CURRENT_USER is already supported here, adding
CURRENT_ROLE is trivial. The other cases are PostgreSQL extensions,
but for the same reason it also makes sense there.
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9%402ndquadrant.com
|
|
This adds a new optional support function to the GiST access method:
sortsupport. If it is defined, the GiST index is built by sorting all data
to the order defined by the sortsupport's comparator function, and packing
the tuples in that order to GiST pages. This is similar to how B-tree
index build works, and is much faster than inserting the tuples one by
one. The resulting index is smaller too, because the pages are packed more
tightly, upto 'fillfactor'. The normal build method works by splitting
pages, which tends to lead to more wasted space.
The quality of the resulting index depends on how good the opclass-defined
sort order is. A good order preserves locality of the input data.
As the first user of this facility, add 'sortsupport' function to the
point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
interleaving the bits of the X and Y coordinates.
Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
|
|
We decided that the policy established in commit 7634bd4f6 for
the bgwriter, checkpointer, walwriter, and walreceiver processes,
namely that they should accept SIGQUIT at all times, really ought
to apply uniformly to all postmaster children. Therefore, get
rid of the duplicative and inconsistent per-process code for
establishing that signal handler and removing SIGQUIT from BlockSig.
Instead, make InitPostmasterChild do it.
The handler set up by InitPostmasterChild is SignalHandlerForCrashExit,
which just summarily does _exit(2). In interactive backends, we
almost immediately replace that with quickdie, since we would prefer
to try to tell the client that we're dying. However, this patch is
changing the behavior of autovacuum (both launcher and workers), as
well as walsenders. Those processes formerly also used quickdie,
but AFAICS that was just mindless copy-and-paste: they don't have
any interactive client that's likely to benefit from being told this.
The stats collector continues to be an outlier, in that it thinks
SIGQUIT means normal exit. That should probably be changed for
consistency, but there's another patch set where that's being
dealt with, so I didn't do so here.
Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
|
|
Previously, it was based on nBlocksAllocated to account for tapes with
open write buffers that may not have made it to the BufFile yet.
That was unnecessary, because callers do not need to get the number of
blocks while a tape has an open write buffer; and it also conflicted
with the preallocation logic added for HashAgg.
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/ce5af05900fdbd0e9185747825a7423c48501964.camel@j-davis.com
Backpatch-through: 13
|
|
Introduced in 0aa8f7640.
MSVC warned about performing 32-bit bit shifting when it appeared like we
might like a 64-bit result. We did, but it just so happened that none of
the calls to this function could have caused the 32-bit shift to overflow.
Here we just cast the constant to int64 to make the compiler happy.
Discussion: https://postgr.es/m/CAApHDvofA_vsrpC13mq_hZyuye5B-ssKEaer04OouXYCO5-uXQ@mail.gmail.com
|
|
Author: Naoki Nakamichi
Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
|
|
|
|
The preallocation logic is only useful for HashAgg, so disable it when
sorting.
Also, adjust an out-of-date comment.
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com
Backpatch-through: 13
|
|
Move applicable code out of RelationBuildDesc(), which nailed relations
bypass. Non-assert builds experienced no known problems. Back-patch to
v13, where commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced
rd_firstRelfilenodeSubid.
Kyotaro Horiguchi. Reported by Justin Pryzby.
Discussion: https://postgr.es/m/20200907023737.GA7158@telsasoft.com
|
|
Existing callers had to take complicated detours via
DirectFunctionCall1(). This simplifies a lot of code.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
|
|
Alexander Lakhin
Discussion: https://postgr.es/m/ce7debdd-c943-d7a7-9b41-687107b27831@gmail.com
|
|
This essentially reverts a micro-optimization I made years ago,
as part of the much larger commit d72f6c750. It's doubtful
that there was any hard evidence for it being helpful even then,
and the case is even more dubious now that modern compilers
are so much smarter about inlining memset().
The proximate reason for undoing it is to get rid of the type punning
inherent in MemSet, for fear that that may cause problems now that
we're applying additional optimization switches to numeric.c.
At the very least this'll silence some warnings from a few old
buildfarm animals.
(It's probably past time for another look at whether MemSet is still
worth anything at all, but I do not propose to tackle that question
right now.)
Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
|
|
Otherwise just printing an empty string makes the memory context debug
output slightly confusing.
Discussion: https://www.postgresql.org/message-id/flat/ccb353ef-89ff-09b3-8046-1d2514624b9c%402ndquadrant.com
|
|
Experimentation shows that clang will auto-vectorize the critical
multiplication loop if the termination condition is written "i2 < limit"
rather than "i2 <= limit". This seems unbelievably stupid, but I've
reproduced it on both clang 9.0.1 (RHEL8) and 11.0.3 (macOS Catalina).
gcc doesn't care, so tweak the code to do it that way.
Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
|
|
Compile numeric.c with -ftree-vectorize where available, and adjust
the innermost loop of mul_var() so that it is amenable to being
auto-vectorized. (Mainly, that involves making it process the arrays
left-to-right not right-to-left.)
Applying -ftree-vectorize actually makes numeric.o smaller, at least
with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.
Independently of that, fixing the inner loop to be vectorizable also
makes things a bit faster. But doing both is a huge win for
multiplications with lots of digits. For me, the numeric regression
test is the same speed to within measurement noise, but numeric_big
is a full 45% faster.
We also looked into applying -funroll-loops, but that makes numeric.o
bloat quite a bit, and the additional speed improvement is very
marginal.
Amit Khandekar, reviewed and edited a little by me
Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
|
|
I'm not sure what tool Ranier was using, but the ones I contributed
were found by using a newer version of scan-build than I tried before.
Ranier Vilela and Tom Lane
Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
|
|
Fix some more things scan-build pointed to as dead stores. In some of
these cases, rearranging the code a little leads to more readable
code IMO. It's all cosmetic, though.
Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
|
|
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=NZPZc3-fkdmvu=w2itx0PiB-G6QpxHXZOjuvFAzPdZw@mail.gmail.com
Backpatch-through: 13
|
|
Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
Author: Ranier Vilela
Backpatch-through: master
|
|
This splits a string at occurrences of a delimiter. It is exactly like
string_to_array() except for producing a set of values instead of an
array of values. Thus, the relationship of these two functions is
the same as between regexp_split_to_table() and regexp_split_to_array().
Although the same results could be had from unnest(string_to_array()),
this is somewhat faster than that, and anyway it seems reasonable to
have it for symmetry with the regexp functions.
Pavel Stehule, reviewed by Peter Smith
Discussion: https://postgr.es/m/CAFj8pRD8HOpjq2TqeTBhSo_QkzjLOhXzGCpKJ4nCs7Y9SQkuPw@mail.gmail.com
|
|
Historically, we've considered the state with relpages and reltuples
both zero as indicating that we do not know the table's tuple density.
This is problematic because it's impossible to distinguish "never yet
vacuumed" from "vacuumed and seen to be empty". In particular, a user
cannot use VACUUM or ANALYZE to override the planner's normal heuristic
that an empty table should not be believed to be empty because it is
probably about to get populated. That heuristic is a good safety
measure, so I don't care to abandon it, but there should be a way to
override it if the table is indeed intended to stay empty.
Hence, represent the initial state of ignorance by setting reltuples
to -1 (relpages is still set to zero), and apply the minimum-ten-pages
heuristic only when reltuples is still -1. If the table is empty,
VACUUM or ANALYZE (but not CREATE INDEX) will override that to
reltuples = relpages = 0, and then we'll plan on that basis.
This requires a bunch of fiddly little changes, but we can get rid of
some ugly kluges that were formerly needed to maintain the old definition.
One notable point is that FDWs' GetForeignRelSize methods will see
baserel->tuples = -1 when no ANALYZE has been done on the foreign table.
That seems like a net improvement, since those methods were formerly
also in the dark about what baserel->tuples = 0 really meant. Still,
it is an API change.
I bumped catversion because code predating this change would get confused
by seeing reltuples = -1.
Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
|
|
Allow BufFile to support temporary files that can be used by the single
backend when the corresponding files need to be survived across the
transaction and need to be opened and closed multiple times. Such files
need to be created as a member of a SharedFileSet.
Additionally, this commit implements the interface for BufFileTruncate to
allow files to be truncated up to a particular offset and extends the
BufFileSeek API to support the SEEK_END case. This also adds an option to
provide a mode while opening the shared BufFiles instead of always opening
in read-only mode.
These enhancements in BufFile interface are required for the upcoming
patch to allow the replication apply worker, to handle streamed
in-progress transactions.
Author: Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
|
|
Previously the codes for pg_backend_memory_contexts were in
src/backend/utils/mmgr/mcxt.c. This commit moves them to
src/backend/utils/adt/mcxtfuncs.c so that mcxt.c basically includes
only the low-level interface for memory contexts.
Author: Atsushi Torikoshi
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/20200819135545.GC19121@paquier.xyz
|
|
This view displays the usages of all the memory contexts of the server
process attached to the current session. This information is useful to
investigate the cause of backend-local memory bloat.
This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But this technique
cannot be uesd in some environments because no debugger is available there.
And it outputs lots of text messages and it's not easy to analyze them.
So, pg_backend_memory_contexts view allows us to access to backend-local
memory contexts information more easily.
Bump catalog version.
Author: Atsushi Torikoshi, Fujii Masao
Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
|
|
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:
This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.
We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built. Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.
On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though). If we were able to safely avoid the lock there'd be another
significant gain on top of that.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
|
|
xact.h included utils/datetime.h, which cannot be used in the frontend
(it includes fmgr.h, which needs Datum). But xact.h only needs the
definition of TimestampTz from it, which is available directly in
datatypes/timestamp.h. Change xact.h to include that instead of
utils/datetime.h, so that it can be used in client programs.
|
|
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
|
|
Fixed more than 10 years ago.
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/aa266ede-baaa-f4e6-06cf-5b1737610e9a%402ndquadrant.com
|
|
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit. No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:
* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive. There is more work needed in that area IMO.)
* Autovacuum ceases to function. We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess. In the worst case
the system could reach a forced shutdown to prevent XID wraparound.
* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.
Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections. Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain. A subsidiary state variable tracks
whether or not we're letting in new connections in those states.
This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes. I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
Back-patch to 9.6 where parallel query was added. In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.
Per report from Bharath Rupireddy.
Patch by me, reviewed by Thomas Munro
Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
|
|
Now that xmin isn't needed for GetSnapshotData() anymore, it leads to
unnecessary cacheline ping-pong to have it in PGXACT, as it is updated
considerably more frequently than the other PGXACT members.
After the changes in dc7420c2c92, this is a very straight-forward change.
For highly concurrent, snapshot acquisition heavy, workloads this change alone
can significantly increase scalability. E.g. plain pgbench on a smaller 2
socket machine gains 1.07x for read-only pgbench, 1.22x for read-only pgbench
when submitting queries in batches of 100, and 2.85x for batches of 100
'SELECT';. The latter numbers are obviously not to be expected in the
real-world, but micro-benchmark the snapshot computation
scalability (previously spending ~80% of the time in GetSnapshotData()).
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
|
|
To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.
Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.
The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.
The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions. These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.
When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession. As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.
To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.
This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.
Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
|
|
Including Full in variable names duplicates the type information and
leads to overly long names. As FullTransactionId cannot accidentally
be casted to TransactionId that does not seem necessary.
Author: Andres Freund
Discussion: https://postgr.es/m/20200724011143.jccsyvsvymuiqfxu@alap3.anarazel.de
|
|
They are equivalent, except that StrNCpy() zero-fills the entire
destination buffer instead of providing just one trailing zero. For
all but a tiny number of callers, that's just overhead rather than
being desirable.
Remove StrNCpy() as it is now unused.
In some cases, namestrcpy() is the more appropriate function to use.
While we're here, simplify the API of namestrcpy(): Remove the return
value, don't check for NULL input. Nothing was using that anyway.
Also, remove a few unused name-related functions.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com
|