summaryrefslogtreecommitdiff
path: root/src/backend/utils/cache/syscache.c
AgeCommit message (Collapse)Author
2024-11-25Avoid "you don't own a lock of type ExclusiveLock" in GRANT TABLESPACE.Noah Misch
This WARNING appeared because SearchSysCacheLocked1() read cc_relisshared before catcache initialization, when the field is false unconditionally. On the basis of reading false there, it constructed a locktag as though pg_tablespace weren't relisshared. Only shared catalogs could be affected, and only GRANT TABLESPACE was affected in practice. SearchSysCacheLocked1() callers use one other shared-relation syscache, DATABASEOID. DATABASEOID is initialized by the end of CheckMyDatabase(), making the problem unreachable for pg_database. Back-patch to v13 (all supported versions). This has no known impact before v16, where ExecGrant_common() first appeared. Earlier branches avoid trouble by having a separate ExecGrant_Tablespace() that doesn't use LOCKTAG_TUPLE. However, leaving this unfixed in v15 could ensnare a future back-patch of a SearchSysCacheLocked1() call. Reported by Aya Iwata. Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com
2024-11-02Revert "For inplace update, send nontransactional invalidations."Noah Misch
This reverts commit 95c5acb3fc261067ab65ddc0b2dca8e162f09442 (v17) and counterparts in each other non-master branch. If released, that commit would have caused a worst-in-years minor release regression, via undetected LWLock self-deadlock. This commit and its self-deadlock fix warrant more bake time in the master branch. Reported by Alexander Lakhin. Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2024-10-25For inplace update, send nontransactional invalidations.Noah Misch
The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2024-09-24For inplace update durability, make heap_update() callers wait.Noah Misch
The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
2022-04-07Revert "Logical decoding of sequences"Tomas Vondra
This reverts a sequence of commits, implementing features related to logical decoding and replication of sequences: - 0da92dc530c9251735fc70b20cd004d9630a1266 - 80901b32913ffa59bf157a4d88284b2b3a7511d9 - b779d7d8fdae088d70da5ed9fcd8205035676df3 - d5ed9da41d96988d905b49bebb273a9b2d6e2915 - a180c2b34de0989269fdb819bff241a249bf5380 - 75b1521dae1ff1fde17fda2e30e591f2e5d64b6a - 2d2232933b02d9396113662e44dca5f120d6830e - 002c9dd97a0c874fd1693a570383e2dd38cd40d5 - 05843b1aa49df2ecc9b97c693b755bd1b6f856a9 The implementation has issues, mostly due to combining transactional and non-transactional behavior of sequences. It's not clear how this could be fixed, but it'll require reworking significant part of the patch. Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com
2022-04-06Allow granting SET and ALTER SYSTEM privileges on GUC parameters.Tom Lane
This patch allows "PGC_SUSET" parameters to be set by non-superusers if they have been explicitly granted the privilege to do so. The privilege to perform ALTER SYSTEM SET/RESET on a specific parameter can also be granted. Such privileges are cluster-wide, not per database. They are tracked in a new shared catalog, pg_parameter_acl. Granting and revoking these new privileges works as one would expect. One caveat is that PGC_USERSET GUCs are unaffected by the SET privilege --- one could wish that those were handled by a revocable grant to PUBLIC, but they are not, because we couldn't make it robust enough for GUCs defined by extensions. Mark Dilger, reviewed at various times by Andrew Dunstan, Robert Haas, Joshua Brindle, and myself Discussion: https://postgr.es/m/3D691E20-C1D5-4B80-8BA5-6BEB63AF3029@enterprisedb.com
2022-03-24Add decoding of sequences to built-in replicationTomas Vondra
This commit adds support for decoding of sequences to the built-in replication (the infrastructure was added by commit 0da92dc530). The syntax and behavior mostly mimics handling of tables, i.e. a publication may be defined as FOR ALL SEQUENCES (replicating all sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating all sequences in a particular schema) or individual sequences. To publish sequence modifications, the publication has to include 'sequence' action. The protocol is extended with a new message, describing sequence increments. A new system view pg_publication_sequences lists all the sequences added to a publication, both directly and indirectly. Various psql commands (\d and \dRp) are improved to also display publications including a given sequence, or sequences included in a publication. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres Freund, Petr Jelinek Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-01-16Add stxdinherit flag to pg_statistic_ext_dataTomas Vondra
Add pg_statistic_ext_data.stxdinherit flag, so that for each extended statistics definition we can store two versions of data - one for the relation alone, one for the whole inheritance tree. This is analogous to pg_statistic.stainherit, but we failed to include such flag in catalogs for extended statistics, and we had to work around it (see commits 859b3003de, 36c4bc6e72 and 20b9fa308e). This changes the relationship between the two catalogs storing extended statistics objects (pg_statistic_ext and pg_statistic_ext_data). Until now, there was a simple 1:1 mapping - for each definition there was one pg_statistic_ext_data row, and this row was inserted while creating the statistics (and then updated during ANALYZE). With the stxdinherit flag, we don't know how many rows there will be (child relations may be added after the statistics object is defined), so there may be up to two rows. We could make CREATE STATISTICS to always create both rows, but that seems wasteful - without partitioning we only need stxdinherit=false rows, and declaratively partitioned tables need only stxdinherit=true. So we no longer initialize pg_statistic_ext_data in CREATE STATISTICS, and instead make that a responsibility of ANALYZE. Which is what we do for regular statistics too. Patch by me, with extensive improvements and fixes by Justin Pryzby. Author: Tomas Vondra, Justin Pryzby Reviewed-by: Tomas Vondra, Justin Pryzby Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-10-27Allow publishing the tables of schema.Amit Kapila
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows one or more schemas to be specified, whose tables are selected by the publisher for sending the data to the subscriber. The new syntax allows specifying both the tables and schemas. For example: CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2; OR ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2; A new system table "pg_publication_namespace" has been added, to maintain the schemas that the user wants to publish through the publication. Modified the output plugin (pgoutput) to publish the changes if the relation is part of schema publication. Updates pg_dump to identify and dump schema publications. Updates the \d family of commands to display schema publications and \dRp+ variant will now display associated schemas if any. Author: Vignesh C, Hou Zhijie, Amit Kapila Syntax-Suggested-by: Tom Lane, Alvaro Herrera Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger Tested-by: Haiying Tang Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-06-29Add index OID macro argument to DECLARE_INDEXPeter Eisentraut
Instead of defining symbols such as AmOidIndexId explicitly, include them as an argument of DECLARE_INDEX() and have genbki.pl generate the way as the table OID symbols from the CATALOG() declaration. Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/ccef1e46-a404-25b1-9b4c-85f2c08e1f28%40enterprisedb.com
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-12-20Multirange datatypesAlexander Korotkov
Multiranges are basically sorted arrays of non-overlapping ranges with set-theoretic operations defined over them. Since v14, each range type automatically gets a corresponding multirange datatype. There are both manual and automatic mechanisms for naming multirange types. Once can specify multirange type name using multirange_type_name attribute in CREATE TYPE.  Otherwise, a multirange type name is generated automatically. If the range type name contains "range" then we change that to "multirange". Otherwise, we add "_multirange" to the end. Implementation of multiranges comes with a space-efficient internal representation format, which evades extra paddings and duplicated storage of oids.  Altogether this format allows fetching a particular range by its index in O(n). Statistic gathering and selectivity estimation are implemented for multiranges. For this purpose, stored multirange is approximated as union range without gaps. This field will likely need improvements in the future. Catversion is bumped. Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0 Author: Paul Jungwirth, revised by me Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-11-07Move catalog index declarationsPeter Eisentraut
Move the system catalog index declarations from catalog/indexing.h to the respective parent tables' catalog/pg_*.h files. The original reason for having it split was that the old genbki system produced the output in the order of the catalog files it read, so all the indexing stuff needed to come separately. But this is no longer the case, and keeping it together makes more sense. Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-11-12Make the order of the header file includes consistent in backend modules.Amit Kapila
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-07Add reusable routine for making arrays unique.Thomas Munro
Introduce qunique() and qunique_arg(), which can be used after qsort() and qsort_arg() respectively to remove duplicate values. Use it where appropriate. Author: Thomas Munro Reviewed-by: Tom Lane (in an earlier version) Discussion: https://postgr.es/m/CAEepm%3D2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA%40mail.gmail.com
2019-07-01pgindent run prior to branching v12.Tom Lane
pgperltidy and reformat-dat-files too, though the latter didn't find anything to change.
2019-06-16Rework the pg_statistic_ext catalogTomas Vondra
Since extended statistic got introduced in PostgreSQL 10, there was a single catalog pg_statistic_ext storing both the definitions and built statistic. That's however problematic when a user is supposed to have access only to the definitions, but not to user data. Consider for example pg_dump on a database with RLS enabled - if the pg_statistic_ext catalog respects RLS (which it should, if it contains user data), pg_dump would not see any records and the result would not define any extended statistics. That would be a surprising behavior. Until now this was not a pressing issue, because the existing types of extended statistic (functional dependencies and ndistinct coefficients) do not include any user data directly. This changed with introduction of MCV lists, which do include most common combinations of values. The easiest way to fix this is to split the pg_statistic_ext catalog into two - one for definitions, one for the built statistic values. The new catalog is called pg_statistic_ext_data, and we're maintaining a 1:1 relationship with the old catalog - either there are matching records in both catalogs, or neither of them. Bumped CATVERSION due to changing system catalog definitions. Author: Dean Rasheed, with improvements by me Reviewed-by: Dean Rasheed, John Naylor Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-05-22Initial pgindent run for v12.Tom Lane
This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-11-20Remove WITH OIDS support, change oid catalog column visibility.Andres Freund
Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-01-29Save a few bytes by removing useless last argument to SearchCatCacheList.Tom Lane
There's never any value in giving a fully specified cache key to SearchCatCacheList: you might as well call SearchCatCache instead, since there could be only one match. So the maximum useful number of key arguments is one less than the supported number of key columns. We might as well remove the useless extra argument and save some few bytes per call site, as well as a cycle or so per call. I believe the reason it was coded like this is that originally, callers had to write out all the dummy arguments in each call, and so it seemed less confusing if SearchCatCache and SearchCatCacheList took the same number of key arguments. But since commit e26c539e9, callers only write their live arguments explicitly, making that a non-factor; and there's surely been enough time for third-party modules to adapt to that coding style. So this is only an ABI break not an API break for callers. Per discussion with Oliver Ford, this might also make it less confusing how to use SearchCatCacheList correctly. Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-10-13Improve sys/catcache performance.Andres Freund
The following are the individual improvements: 1) Avoidance of FunctionCallInfo based function calls, replaced by more efficient functions with a native C argument interface. 2) Don't extract columns from a cache entry's tuple whenever matching entries - instead store them as a Datum array. This also allows to get rid of having to build dummy tuples for negative & list entries, and of a hack for dealing with cstring vs. text weirdness. 3) Reorder members of catcache.h struct, so imortant entries are more likely to be on one cacheline. 4) Allowing the compiler to specialize critical SearchCatCache for a specific number of attributes allows to unroll loops and avoid other nkeys dependant initialization. 5) Only initializing the ScanKey when necessary, i.e. catcache misses, greatly reduces cache unnecessary cpu cache misses. 6) Split of the cache-miss case from the hash lookup, reducing stack allocations etc in the common case. 7) CatCTup and their corresponding heaptuple are allocated in one piece. This results in making cache lookups themselves roughly three times as fast - full-system benchmarks obviously improve less than that. I've also evaluated further techniques: - replace open coded hash with simplehash - the list walk right now shows up in profiles. Unfortunately it's not easy to do so safely as an entry's memory location can change at various times, which doesn't work well with the refcounting and cache invalidation. - Cacheline-aligning CatCTup entries - helps some with performance, but the win isn't big and the code for it is ugly, because the tuples have to be freed as well. - add more proper functions, rather than macros for SearchSysCacheCopyN etc., but right now they don't show up in profiles. The reason the macro wrapper for syscache.c/h have to be changed, rather than just catcache, is that doing otherwise would require exposing the SysCache array to the outside. That might be a good idea anyway, but it's for another day. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
2017-09-06Allow SET STATISTICS on expression indexesSimon Riggs
Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-05-30Sort syscache identifiers into alphabetical order.Tom Lane
Not much point in having a convention about this if we don't enforce it. Mark Dilger Discussion: https://postgr.es/m/7F67FBEF-C3B3-404E-8EC6-E02ACB15D894@gmail.com
2017-05-17Post-PG 10 beta1 pgindent runBruce Momjian
perltidy run not included.
2017-05-12Avoid searching for callback functions in CallSyscacheCallbacks().Tom Lane
We have now grown enough registerable syscache-invalidation callback functions that the original assumption that there would be few of them is causing performance problems. In particular, let's fix things so that CallSyscacheCallbacks doesn't have to search the whole array to find which callback(s) to invoke for a given cache ID. Preserve the original behavior that callbacks are called in order of registration, just in case there's someplace that depends on that (which I doubt). In support of this, export the number of syscaches from syscache.h. People could have found that out anyway from the enum, but adding a #define makes that much safer. This provides a useful additional speedup in Mathieu Fenniak's logical-decoding test case, although we're reaching the point of diminishing returns there. I think any further improvement will have to come from reducing the number of cache invalidations that are triggered in the first place. Still, we can hope that this change gives some incremental benefit for all invalidation scenarios. Back-patch to 9.4 where logical decoding was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12Avoid searching for the target catcache in CatalogCacheIdInvalidate.Tom Lane
A test case provided by Mathieu Fenniak shows that the initial search for the target catcache in CatalogCacheIdInvalidate consumes a very significant amount of overhead in cases where cache invalidation is triggered but has little useful work to do. There is no good reason for that search to exist at all, as the index array maintained by syscache.c allows direct lookup of the catcache from its ID. We just need a frontend function in syscache.c, matching the division of labor for most other cache-accessing operations. While there's more that can be done in this area, this patch alone reduces the runtime of Mathieu's example by 2X. We can hope that it offers some useful benefit in other cases too, although usually cache invalidation overhead is not such a striking fraction of the total runtime. Back-patch to 9.4 where logical decoding was introduced. It might be worth going further back, but presently the only case we know of where cache invalidation is really a significant burden is in logical decoding. Also, older branches have fewer catcaches, reducing the possible benefit. (Note: although this nominally changes catcache's API, we have always documented CatalogCacheIdInvalidate as a private function, so I would have little sympathy for an external module calling it directly. So backpatching should be fine.) Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-04-17Rename columns in new pg_statistic_ext catalogAlvaro Herrera
The new catalog reused a column prefix "sta" from pg_statistic, but this is undesirable, so change the catalog to use prefix "stx" instead. Also, rename the column that lists enabled statistic kinds as "stxkind" rather than "enabled". Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-03-24Implement multivariate n-distinct coefficientsAlvaro Herrera
Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-23Logical replication support for initial data copyPeter Eisentraut
Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>
2017-03-02Update comments overlooked by 2f5c9d9c9cec436e55847ec580606d7e88067df6.Robert Haas
Tomas Vondra
2017-01-20Logical replicationPeter Eisentraut
- Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-12-20Add pg_sequence system catalogPeter Eisentraut
Move sequence metadata (start, increment, etc.) into a proper system catalog instead of storing it in the sequence heap object. This separates the metadata from the sequence data. Sequence metadata is now operated on transactionally by DDL commands, whereas previously rollbacks of sequence-related DDL commands would be ignored. Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-12-07Implement table partitioning.Robert Haas
Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-01-17Restructure index access method API to hide most of it at the C level.Tom Lane
This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-07-25Redesign tablesample method API, and do extensive code review.Tom Lane
The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
2015-06-07Use a safer method for determining whether relcache init file is stale.Tom Lane
When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-05-23pgindent run for 9.5Bruce Momjian
2015-05-15TABLESAMPLE, SQL Standard and extensibleSimon Riggs
Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
2015-04-29Introduce replication progress tracking infrastructure.Andres Freund
When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-26Add transforms featurePeter Eisentraut
This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python. reviews by Pavel Stěhule and Andres Freund
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.