summaryrefslogtreecommitdiff
path: root/src/backend/utils
AgeCommit message (Collapse)Author
2020-10-02Add pg_stat_wal statistics view.Fujii Masao
This view shows the statistics about WAL activity. Currently it has only two columns: wal_buffers_full and stats_reset. wal_buffers_full column indicates the number of times WAL data was written to the disk because WAL buffers got full. This information is useful when tuning wal_buffers. stats_reset column indicates the time at which these statistics were last reset. pg_stat_wal view is also the basic infrastructure to expose other various statistics about WAL activity later. Bump PGSTAT_FILE_FORMAT_ID due to the change in pgstat format. Bump catalog version. Author: Masahiro Ikeda Reviewed-by: Takayuki Tsunakawa, Kyotaro Horiguchi, Amit Kapila, Fujii Masao Discussion: https://postgr.es/m/188bd3f2d2233cf97753b5ced02bb050@oss.nttdata.com
2020-09-30Fix handling of BC years in to_date/to_timestamp.Tom Lane
Previously, a conversion such as to_date('-44-02-01','YYYY-MM-DD') would result in '0045-02-01 BC', as the code attempted to interpret the negative year as BC, but failed to apply the correction needed for our internal handling of BC years. Fix the off-by-one problem. Also, arrange for the combination of a negative year and an explicit "BC" marker to cancel out and produce AD. This is how the negative-century case works, so it seems sane to do likewise. Continue to read "year 0000" as 1 BC. Oracle would throw an error, but we've accepted that case for a long time so I'm hesitant to change it in a back-patch. Per bug #16419 from Saeed Hubaishan. Back-patch to all supported branches. Dar Alathar-Yemen and Tom Lane Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
2020-09-29Fix make_timestamp[tz] to accept negative years as meaning BC.Tom Lane
Previously we threw an error. But make_date already allowed the case, so it is inconsistent as well as unhelpful for make_timestamp not to. Both functions continue to reject year zero. Code and test fixes by Peter Eisentraut, doc changes by me Discussion: https://postgr.es/m/13c0992c-f15a-a0ca-d839-91d3efd965d9@2ndquadrant.com
2020-09-29Support for ISO 8601 in the jsonpath .datetime() methodAlexander Korotkov
The SQL standard doesn't require jsonpath .datetime() method to support the ISO 8601 format. But our to_json[b]() functions convert timestamps to text in the ISO 8601 format in the sake of compatibility with javascript. So, we add support of the ISO 8601 to the jsonpath .datetime() in the sake compatibility with to_json[b](). The standard mode of datetime parsing currently supports just template patterns and separators in the format string. In order to implement ISO 8601, we have to add support of the format string double quotes to the standard parsing mode. Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru Author: Nikita Glukhov, revised by me Backpatch-through: 13
2020-09-29Remove excess space from jsonpath .datetime() default format stringAlexander Korotkov
bffe1bd684 has introduced jsonpath .datetime() method, but default formats for time and timestamp contain excess space between time and timezone. This commit removes this excess space making behavior of .datetime() method standard-compliant. Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru Author: Nikita Glukhov Backpatch-through: 13
2020-09-28Add for_each_from, to simplify loops starting from non-first list cells.Tom Lane
We have a dozen or so places that need to iterate over all but the first cell of a List. Prior to v13 this was typically written as for_each_cell(lc, lnext(list_head(list))) Commit 1cff1b95a changed these to for_each_cell(lc, list, list_second_cell(list)) This patch introduces a new macro for_each_from() which expresses the start point as a list index, allowing these to be written as for_each_from(lc, list, 1) This is marginally more efficient, since ForEachState.i can be initialized directly instead of backing into it from a ListCell address. It also seems clearer and less typo-prone. Some of the remaining uses of for_each_cell() look like they could profitably be changed to for_each_from(), but here I confined myself to changing uses of list_second_cell(). Also, fix for_each_cell_setup() and for_both_cell_setup() to const-ify their arguments; that's a simple oversight in 1cff1b95a. Back-patch into v13, on the grounds that (1) the const-ification is a minor bug fix, and (2) it's better for back-patching purposes if we only have two ways to write these loops rather than three. In HEAD, also remove list_third_cell() and list_fourth_cell(), which were also introduced in 1cff1b95a, and are unused as of cc99baa43. It seems unlikely that any third-party code would have started to use them already; anyone who has can be directed to list_nth_cell instead. Discussion: https://postgr.es/m/CAApHDvpo1zj9KhEpU2cCRZfSM3Q6XGdhzuAS2v79PH7WJBkYVA@mail.gmail.com
2020-09-28Cache the result of converting now() to a struct pg_tm.Tom Lane
SQL operations such as CURRENT_DATE, CURRENT_TIME, LOCALTIME, and conversion of "now" in a datetime input string have to obtain the transaction start timestamp ("now()") as a broken-down struct pg_tm. This is a remarkably expensive conversion, and since now() does not change intra-transaction, it doesn't really need to be done more than once per transaction. Introducing a simple cache provides visible speedups in queries that compute these values many times, for example insertion of many rows that use a default value of CURRENT_DATE. Peter Smith, with a bit of kibitzing by me Discussion: https://postgr.es/m/CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com
2020-09-27Move resolution of AlternativeSubPlan choices to the planner.Tom Lane
When commit bd3daddaf introduced AlternativeSubPlans, I had some ambitions towards allowing the choice of subplan to change during execution. That has not happened, or even been thought about, in the ensuing twelve years; so it seems like a failed experiment. So let's rip that out and resolve the choice of subplan at the end of planning (in setrefs.c) rather than during executor startup. This has a number of positive benefits: * Removal of a few hundred lines of executor code, since AlternativeSubPlans need no longer be supported there. * Removal of executor-startup overhead (particularly, initialization of subplans that won't be used). * Removal of incidental costs of having a larger plan tree, such as tree-scanning and copying costs in the plancache; not to mention setrefs.c's own costs of processing the discarded subplans. * EXPLAIN no longer has to print a weird (and undocumented) representation of an AlternativeSubPlan choice; it sees only the subplan actually used. This should mean less confusion for users. * Since setrefs.c knows which subexpression of a plan node it's working on at any instant, it's possible to adjust the estimated number of executions of the subplan based on that. For example, we should usually estimate more executions of a qual expression than a targetlist expression. The implementation used here is pretty simplistic, because we don't want to expend a lot of cycles on the issue; but it's better than ignoring the point entirely, as the executor had to. That last point might possibly result in shifting the choice between hashed and non-hashed EXISTS subplans in a few cases, but in general this patch isn't meant to change planner choices. Since we're doing the resolution so late, it's really impossible to change any plan choices outside the AlternativeSubPlan itself. Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
2020-09-24Fix two bugs in MaintainOldSnapshotTimeMapping.Robert Haas
The previous coding was confused about whether head_timestamp was intended to represent the timestamp for the newest bucket in the mapping or the oldest timestamp for the oldest bucket in the mapping. Decide that it's intended to be the oldest one, and repair accordingly. To do that, we need to do two things. First, when advancing to a new bucket, don't categorically set head_timestamp to the new timestamp. Do this only if we're blowing out the map completely because a lot of time has passed since we last maintained it. If we're replacing entries one by one, advance head_timestamp by 1 minute for each; if we're filling in unused entries, don't advance head_timestamp at all. Second, fix the computation of how many buckets we need to advance. The previous formula would be correct if head_timestamp were the timestamp for the new bucket, but we're now making all the code agree that it's the timestamp for the oldest bucket, so adjust the formula accordingly. This is certainly a bug fix, but I don't feel good about back-patching it without the introspection tools added by commit aecf5ee2bb36c597d3c6142e367e38d67816c777, and perhaps also some actual tests. Since back-patching the introspection tools might not attract sufficient support and since there are no automated tests of these fixes yet, I'm just committing this to master for now. Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar. Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
2020-09-24Expose oldSnapshotControl definition via new header.Robert Haas
This makes it possible for code outside snapmgr.c to examine the contents of this data structure. This commit does not add any code which actually does so; a subsequent commit will make that change. Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar. Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
2020-09-21Fix a few more generator scripts to produce pgindent-clean output.Tom Lane
This completes the project of making all our derived files be pgindent-clean (or else explicitly excluded from indentation), so that no surprises result when running pgindent in a built-out development tree. Discussion: https://postgr.es/m/79ed5348-be7a-b647-dd40-742207186a22@2ndquadrant.com
2020-09-21Copy editing: fix a bunch of misspellings and poor wording.Tom Lane
99% of this is docs, but also a couple of comments. No code changes. Justin Pryzby Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com
2020-09-21Standardize order of use strict and use warnings in Perl codePeter Eisentraut
The standard order in PostgreSQL and other code is use strict first, but some code was uselessly inconsistent about this.
2020-09-19Code review for dynahash change.Thomas Munro
Commit be0a6666 left behind a comment about the order of some tests that didn't make sense without the expensive division, and in fact we might as well change the order to one that fails more cheaply most of the time as a micro-optimization. Also, remove the "+ 1" applied to max_bucket, to drop an instruction and match the original behavior. Per review from Tom Lane. Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2020-09-19Remove large fill factor support from dynahash.c.Thomas Munro
Since ancient times we have had support for a fill factor (maximum load factor) to be set for a dynahash hash table, but: 1. It was an integer, whereas for in-memory hash tables interesting load factor targets are probably somewhere near the 0.75-1.0 range. 2. It was implemented in a way that performed an expensive division operation that regularly showed up in profiles. 3. We are not aware of anyone ever having used a non-default value. Therefore, remove support, effectively fixing it at 1. Author: Jakub Wartak <Jakub.Wartak@tomtom.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2020-09-18Allow most keywords to be used as column labels without requiring AS.Tom Lane
Up to now, if you tried to omit "AS" before a column label in a SELECT list, it would only work if the column label was an IDENT, that is not any known keyword. This is rather unfriendly considering that we have so many keywords and are constantly growing more. In the wake of commit 1ed6b8956 it's possible to improve matters quite a bit. We'd originally tried to make this work by having some of the existing keyword categories be allowed without AS, but that didn't work too well, because each category contains a few special cases that don't work without AS. Instead, invent an entirely orthogonal keyword property "can be bare column label", and mark all keywords that way for which we don't get shift/reduce errors by doing so. It turns out that of our 450 current keywords, all but 39 can be made bare column labels, improving the situation by over 90%. This number might move around a little depending on future grammar work, but it's a pretty nice improvement. Mark Dilger, based on work by myself and Robert Haas; review by John Naylor Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17Remove support for postfix (right-unary) operators.Tom Lane
This feature has been a thorn in our sides for a long time, causing many grammatical ambiguity problems. It doesn't seem worth the pain to continue to support it, so remove it. There are some follow-on improvements we can make in the grammar, but this commit only removes the bare minimum number of productions, plus assorted backend support code. Note that pg_dump and psql continue to have full support, since they may be used against older servers. However, pg_dump warns about postfix operators. There is also a check in pg_upgrade. Documentation-wise, I (tgl) largely removed the "left unary" terminology in favor of saying "prefix operator", which is a more standard and IMO less confusing term. I included a catversion bump, although no initial catalog data changes here, to mark the boundary at which oprkind = 'r' stopped being valid in pg_operator. Mark Dilger, based on work by myself and Robert Haas; review by John Naylor Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17Allow CURRENT_ROLE where CURRENT_USER is acceptedPeter Eisentraut
In the particular case of GRANTED BY, this is specified in the SQL standard. Since in PostgreSQL, CURRENT_ROLE is equivalent to CURRENT_USER, and CURRENT_USER is already supported here, adding CURRENT_ROLE is trivial. The other cases are PostgreSQL extensions, but for the same reason it also makes sense there. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Asif Rehman <asifr.rehman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9%402ndquadrant.com
2020-09-17Add support for building GiST index by sorting.Heikki Linnakangas
This adds a new optional support function to the GiST access method: sortsupport. If it is defined, the GiST index is built by sorting all data to the order defined by the sortsupport's comparator function, and packing the tuples in that order to GiST pages. This is similar to how B-tree index build works, and is much faster than inserting the tuples one by one. The resulting index is smaller too, because the pages are packed more tightly, upto 'fillfactor'. The normal build method works by splitting pages, which tends to lead to more wasted space. The quality of the resulting index depends on how good the opclass-defined sort order is. A good order preserves locality of the input data. As the first user of this facility, add 'sortsupport' function to the point_ops opclass. It sorts the points in Z-order (aka Morton Code), by interleaving the bits of the X and Y coordinates. Author: Andrey Borodin Reviewed-by: Pavel Borisov, Thomas Munro Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
2020-09-16Centralize setup of SIGQUIT handling for postmaster child processes.Tom Lane
We decided that the policy established in commit 7634bd4f6 for the bgwriter, checkpointer, walwriter, and walreceiver processes, namely that they should accept SIGQUIT at all times, really ought to apply uniformly to all postmaster children. Therefore, get rid of the duplicative and inconsistent per-process code for establishing that signal handler and removing SIGQUIT from BlockSig. Instead, make InitPostmasterChild do it. The handler set up by InitPostmasterChild is SignalHandlerForCrashExit, which just summarily does _exit(2). In interactive backends, we almost immediately replace that with quickdie, since we would prefer to try to tell the client that we're dying. However, this patch is changing the behavior of autovacuum (both launcher and workers), as well as walsenders. Those processes formerly also used quickdie, but AFAICS that was just mindless copy-and-paste: they don't have any interactive client that's likely to benefit from being told this. The stats collector continues to be an outlier, in that it thinks SIGQUIT means normal exit. That should probably be changed for consistency, but there's another patch set where that's being dealt with, so I didn't do so here. Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
2020-09-15Change LogicalTapeSetBlocks() to use nBlocksWritten.Jeff Davis
Previously, it was based on nBlocksAllocated to account for tapes with open write buffers that may not have made it to the BufFile yet. That was unnecessary, because callers do not need to get the number of blocks while a tape has an open write buffer; and it also conflicted with the preallocation logic added for HashAgg. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/ce5af05900fdbd0e9185747825a7423c48501964.camel@j-davis.com Backpatch-through: 13
2020-09-15Fix compiler warningDavid Rowley
Introduced in 0aa8f7640. MSVC warned about performing 32-bit bit shifting when it appeared like we might like a 64-bit result. We did, but it just so happened that none of the calls to this function could have caused the 32-bit shift to overflow. Here we just cast the constant to int64 to make the compiler happy. Discussion: https://postgr.es/m/CAApHDvofA_vsrpC13mq_hZyuye5B-ssKEaer04OouXYCO5-uXQ@mail.gmail.com
2020-09-14Fix typos.Fujii Masao
Author: Naoki Nakamichi Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
2020-09-14Message fixes and style improvementsPeter Eisentraut
2020-09-11logtape.c: do not preallocate for tapes when sortingJeff Davis
The preallocation logic is only useful for HashAgg, so disable it when sorting. Also, adjust an out-of-date comment. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com Backpatch-through: 13
2020-09-09Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers.Noah Misch
Move applicable code out of RelationBuildDesc(), which nailed relations bypass. Non-assert builds experienced no known problems. Back-patch to v13, where commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced rd_firstRelfilenodeSubid. Kyotaro Horiguchi. Reported by Justin Pryzby. Discussion: https://postgr.es/m/20200907023737.GA7158@telsasoft.com
2020-09-09Expose internal function for converting int64 to numericPeter Eisentraut
Existing callers had to take complicated detours via DirectFunctionCall1(). This simplifies a lot of code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-09-09Minor fixes in docs and error messages.Tom Lane
Alexander Lakhin Discussion: https://postgr.es/m/ce7debdd-c943-d7a7-9b41-687107b27831@gmail.com
2020-09-08Use plain memset() in numeric.c, not MemSet and friends.Tom Lane
This essentially reverts a micro-optimization I made years ago, as part of the much larger commit d72f6c750. It's doubtful that there was any hard evidence for it being helpful even then, and the case is even more dubious now that modern compilers are so much smarter about inlining memset(). The proximate reason for undoing it is to get rid of the type punning inherent in MemSet, for fear that that may cause problems now that we're applying additional optimization switches to numeric.c. At the very least this'll silence some warnings from a few old buildfarm animals. (It's probably past time for another look at whether MemSet is still worth anything at all, but I do not propose to tackle that question right now.) Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-08Use <unnamed> for name of unnamed portal's memory contextPeter Eisentraut
Otherwise just printing an empty string makes the memory context debug output slightly confusing. Discussion: https://www.postgresql.org/message-id/flat/ccb353ef-89ff-09b3-8046-1d2514624b9c%402ndquadrant.com
2020-09-07Frob numeric.c loop so that clang will auto-vectorize it too.Tom Lane
Experimentation shows that clang will auto-vectorize the critical multiplication loop if the termination condition is written "i2 < limit" rather than "i2 <= limit". This seems unbelievably stupid, but I've reproduced it on both clang 9.0.1 (RHEL8) and 11.0.3 (macOS Catalina). gcc doesn't care, so tweak the code to do it that way. Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-06Apply auto-vectorization to the inner loop of numeric multiplication.Tom Lane
Compile numeric.c with -ftree-vectorize where available, and adjust the innermost loop of mul_var() so that it is amenable to being auto-vectorized. (Mainly, that involves making it process the arrays left-to-right not right-to-left.) Applying -ftree-vectorize actually makes numeric.o smaller, at least with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too. Independently of that, fixing the inner loop to be vectorizable also makes things a bit faster. But doing both is a huge win for multiplications with lots of digits. For me, the numeric regression test is the same speed to within measurement noise, but numeric_big is a full 45% faster. We also looked into applying -funroll-loops, but that makes numeric.o bloat quite a bit, and the additional speed improvement is very marginal. Amit Khandekar, reviewed and edited a little by me Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-05Yet more elimination of dead stores and useless initializations.Tom Lane
I'm not sure what tool Ranier was using, but the ones I contributed were found by using a newer version of scan-build than I tried before. Ranier Vilela and Tom Lane Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-04Remove still more useless assignments.Tom Lane
Fix some more things scan-build pointed to as dead stores. In some of these cases, rearranging the code a little leads to more readable code IMO. It's all cosmetic, though. Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-04Fix bogus MaxAllocSize check in logtape.c.Jeff Davis
Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wz=NZPZc3-fkdmvu=w2itx0PiB-G6QpxHXZOjuvFAzPdZw@mail.gmail.com Backpatch-through: 13
2020-09-03remove redundant initializationsBruce Momjian
Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com Author: Ranier Vilela Backpatch-through: master
2020-09-02Add string_to_table() function.Tom Lane
This splits a string at occurrences of a delimiter. It is exactly like string_to_array() except for producing a set of values instead of an array of values. Thus, the relationship of these two functions is the same as between regexp_split_to_table() and regexp_split_to_array(). Although the same results could be had from unnest(string_to_array()), this is somewhat faster than that, and anyway it seems reasonable to have it for symmetry with the regexp functions. Pavel Stehule, reviewed by Peter Smith Discussion: https://postgr.es/m/CAFj8pRD8HOpjq2TqeTBhSo_QkzjLOhXzGCpKJ4nCs7Y9SQkuPw@mail.gmail.com
2020-08-30Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.Tom Lane
Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-26Extend the BufFile interface.Amit Kapila
Allow BufFile to support temporary files that can be used by the single backend when the corresponding files need to be survived across the transaction and need to be opened and closed multiple times. Such files need to be created as a member of a SharedFileSet. Additionally, this commit implements the interface for BufFileTruncate to allow files to be truncated up to a particular offset and extends the BufFileSeek API to support the SEEK_END case. This also adds an option to provide a mode while opening the shared BufFiles instead of always opening in read-only mode. These enhancements in BufFile interface are required for the upcoming patch to allow the replication apply worker, to handle streamed in-progress transactions. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-08-26Move codes for pg_backend_memory_contexts from mmgr/mcxt.c to adt/mcxtfuncs.c.Fujii Masao
Previously the codes for pg_backend_memory_contexts were in src/backend/utils/mmgr/mcxt.c. This commit moves them to src/backend/utils/adt/mcxtfuncs.c so that mcxt.c basically includes only the low-level interface for memory contexts. Author: Atsushi Torikoshi Reviewed-by: Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/20200819135545.GC19121@paquier.xyz
2020-08-19Add pg_backend_memory_contexts system view.Fujii Masao
This view displays the usages of all the memory contexts of the server process attached to the current session. This information is useful to investigate the cause of backend-local memory bloat. This information can be also collected by calling MemoryContextStats(TopMemoryContext) via a debugger. But this technique cannot be uesd in some environments because no debugger is available there. And it outputs lots of text messages and it's not easy to analyze them. So, pg_backend_memory_contexts view allows us to access to backend-local memory contexts information more easily. Bump catalog version. Author: Atsushi Torikoshi, Fujii Masao Reviewed-by: Tatsuhito Kasahara, Andres Freund, Daniel Gustafsson, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
2020-08-17snapshot scalability: cache snapshots using a xact completion counter.Andres Freund
Previous commits made it faster/more scalable to compute snapshots. But not building a snapshot is still faster. Now that GetSnapshotData() does not maintain RecentGlobal* anymore, that is actually not too hard: This commit introduces xactCompletionCount, which tracks the number of top-level transactions with xids (i.e. which may have modified the database) that completed in some form since the start of the server. We can avoid rebuilding the snapshot's contents whenever the current xactCompletionCount is the same as it was when the snapshot was originally built. Currently this check happens while holding ProcArrayLock. While it's likely possible to perform the check without acquiring ProcArrayLock, it seems better to do that separately / later, some careful analysis is required. Even with the lock this is a significant win on its own. On a smaller two socket machine this gains another ~1.03x, on a larger machine the effect is roughly double (earlier patch version tested though). If we were able to safely avoid the lock there'd be another significant gain on top of that. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-17Make xact.h usable in frontend.Heikki Linnakangas
xact.h included utils/datetime.h, which cannot be used in the frontend (it includes fmgr.h, which needs Datum). But xact.h only needs the definition of TimestampTz from it, which is available directly in datatypes/timestamp.h. Change xact.h to include that instead of utils/datetime.h, so that it can be used in client programs.
2020-08-15Prevent concurrent SimpleLruTruncate() for any given SLRU.Noah Misch
The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
2020-08-15Remove obsolete HAVE_BUGGY_SOLARIS_STRTODPeter Eisentraut
Fixed more than 10 years ago. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://www.postgresql.org/message-id/flat/aa266ede-baaa-f4e6-06cf-5b1737610e9a%402ndquadrant.com
2020-08-14Fix postmaster's behavior during smart shutdown.Tom Lane
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the postmaster has immediately killed all "optional" background processes, and subsequently refused to launch new ones while it's waiting for foreground client processes to exit. No doubt this seemed like an OK policy at some point; but it's a pretty bad one now, because it makes for a seriously degraded environment for the remaining clients: * Parallel queries are killed, and new ones fail to launch. (And our parallel-query infrastructure utterly fails to deal with the case in a reasonable way --- it just hangs waiting for workers that are not going to arrive. There is more work needed in that area IMO.) * Autovacuum ceases to function. We can tolerate that for awhile, but if bulk-update queries continue to run in the surviving client sessions, there's eventually going to be a mess. In the worst case the system could reach a forced shutdown to prevent XID wraparound. * The bgwriter and walwriter are also stopped immediately, likely resulting in performance degradation. Hence, let's rearrange things so that the only immediate change in behavior is refusing to let in new normal connections. Once the last normal connection is gone, shut everything down as though we'd received a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY while normal connections remain. A subsidiary state variable tracks whether or not we're letting in new connections in those states. This also allows having just one copy of the logic for killing child processes in smart and fast shutdown modes. I moved that logic into PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS. Back-patch to 9.6 where parallel query was added. In principle this'd be a good idea in 9.5 as well, but the risk/reward ratio is not as good there, since lack of autovacuum is not a problem during typical uses of smart shutdown. Per report from Bharath Rupireddy. Patch by me, reviewed by Thomas Munro Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
2020-08-13snapshot scalability: Move PGXACT->xmin back to PGPROC.Andres Freund
Now that xmin isn't needed for GetSnapshotData() anymore, it leads to unnecessary cacheline ping-pong to have it in PGXACT, as it is updated considerably more frequently than the other PGXACT members. After the changes in dc7420c2c92, this is a very straight-forward change. For highly concurrent, snapshot acquisition heavy, workloads this change alone can significantly increase scalability. E.g. plain pgbench on a smaller 2 socket machine gains 1.07x for read-only pgbench, 1.22x for read-only pgbench when submitting queries in batches of 100, and 2.85x for batches of 100 'SELECT';. The latter numbers are obviously not to be expected in the real-world, but micro-benchmark the snapshot computation scalability (previously spending ~80% of the time in GetSnapshotData()). Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-12snapshot scalability: Don't compute global horizons while building snapshots.Andres Freund
To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-11Rename VariableCacheData.nextFullXid to nextXid.Andres Freund
Including Full in variable names duplicates the type information and leads to overly long names. As FullTransactionId cannot accidentally be casted to TransactionId that does not seem necessary. Author: Andres Freund Discussion: https://postgr.es/m/20200724011143.jccsyvsvymuiqfxu@alap3.anarazel.de
2020-08-10Replace remaining StrNCpy() by strlcpy()Peter Eisentraut
They are equivalent, except that StrNCpy() zero-fills the entire destination buffer instead of providing just one trailing zero. For all but a tiny number of callers, that's just overhead rather than being desirable. Remove StrNCpy() as it is now unused. In some cases, namestrcpy() is the more appropriate function to use. While we're here, simplify the API of namestrcpy(): Remove the return value, don't check for NULL input. Nothing was using that anyway. Also, remove a few unused name-related functions. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com