summaryrefslogtreecommitdiff
path: root/src/backend/access/common
AgeCommit message (Collapse)Author
2023-10-31doc: 1-byte varlena headers can be used for user PLAIN storageBruce Momjian
This also updates some C comments. Reported-by: suchithjn22@gmail.com Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org Author: Laurenz Albe (doc patch) Backpatch-through: 11
2023-08-22Cache by-reference missing values in a long lived contextAndrew Dunstan
Attribute missing values might be needed past the lifetime of the tuple descriptors from which they are extracted. To avoid possibly using pointers for by-reference values which might thus be left dangling, we cache a datumCopy'd version of the datum in the TopMemoryContext. Since we first search for the value this only needs to be done once per session for any such value. Original complaint from Tom Lane, idea for mitigation by Andrew Dunstan, tweaked by Tom Lane. Backpatch to version 11 where missing values were introduced. Discussion: https://postgr.es/m/1306569.1687978174@sss.pgh.pa.us
2023-04-23Fix custom validators call in build_local_reloptions()Alexander Korotkov
We need to call them only when validate == true. Backpatch to 13, where opclass options were introduced. Reported-by: Tom Lane Discussion: https://postgr.es/m/2656633.1681831542%40sss.pgh.pa.us Reviewed-by: Tom Lane, Pavel Borisov Backpatch-through: 13
2023-01-03Fix typos in comments, code and documentationMichael Paquier
While on it, newlines are removed from the end of two elog() strings. The others are simple grammar mistakes. One comment in pg_upgrade referred incorrectly to sequences since a7e5457. Author: Justin Pryzby Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com Backpatch-through: 11
2021-12-08Fix corruption of toast indexes with REINDEX CONCURRENTLYMichael Paquier
REINDEX CONCURRENTLY run on a toast index or a toast relation could corrupt the target indexes rebuilt, as a backend running in parallel that manipulates toast values would directly release the lock on the toast relation when its local operation is done, rather than releasing the lock once the transaction that manipulated the toast values committed. The fix done here is simple: we now hold a ROW EXCLUSIVE lock on the toast relation when saving or deleting a toast value until the transaction working on them is committed, so as a concurrent reindex happening in parallel would be able to wait for any activity and see any new rows inserted (or deleted). An isolation test is added to check after the case fixed here, which is a bit fancy by design as it relies on allow_system_table_mods to rename the toast table and its index to fixed names. This way, it is possible to reindex them directly without any dependency on the OID of the underlying relation. Note that this could not use a DO block either, as REINDEX CONCURRENTLY cannot be run in a transaction block. The test is backpatched down to 13, where it is possible, thanks to c4a7a39, to use allow_system_table_mods in a test suite. Reported-by: Alexey Ermakov Analyzed-by: Andres Freund, Noah Misch Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/17268-d2fb426e0895abd4@postgresql.org Backpatch-through: 12
2021-05-20Avoid detoasting failure after COMMIT inside a plpgsql FOR loop.Tom Lane
exec_for_query() normally tries to prefetch a few rows at a time from the query being iterated over, so as to reduce executor entry/exit overhead. Unfortunately this is unsafe if we have COMMIT or ROLLBACK within the loop, because there might be TOAST references in the data that we prefetched but haven't yet examined. Immediately after the COMMIT/ROLLBACK, we have no snapshots in the session, meaning that VACUUM is at liberty to remove recently-deleted TOAST rows. This was originally reported as a case triggering the "no known snapshots" error in init_toast_snapshot(), but even if you miss hitting that, you can get "missing toast chunk", as illustrated by the added isolation test case. To fix, just disable prefetching in non-atomic contexts. Maybe there will be performance complaints prompting us to work harder later, but it's not clear at the moment that this really costs much, and I doubt we'd want to back-patch any complicated fix. In passing, adjust that error message in init_toast_snapshot() to be a little clearer about the likely cause of the problem. Patch by me, based on earlier investigation by Konstantin Knizhnik. Per bug #15990 from Andreas Wicht. Back-patch to v11 where intra-procedure COMMIT was added. Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-02-08Fix permission checks on constraint violation errors on partitions.Heikki Linnakangas
If a cross-partition UPDATE violates a constraint on the target partition, and the columns in the new partition are in different physical order than in the parent, the error message can reveal columns that the user does not have SELECT permission on. A similar bug was fixed earlier in commit 804b6b6db4. The cause of the bug is that the callers of the ExecBuildSlotValueDescription() function got confused when constructing the list of modified columns. If the tuple was routed from a parent, we converted the tuple to the parent's format, but the list of modified columns was grabbed directly from the child's RTE entry. ExecUpdateLockMode() had a similar issue. That lead to confusion on which columns are key columns, leading to wrong tuple lock being taken on tables referenced by foreign keys, when a row is updated with INSERT ON CONFLICT UPDATE. A new isolation test is added for that corner case. With this patch, the ri_RangeTableIndex field is no longer set for partitions that don't have an entry in the range table. Previously, it was set to the RTE entry of the parent relation, but that was confusing. NOTE: This modifies the ResultRelInfo struct, replacing the ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to backpatch, because it breaks any extensions accessing the field. The change that ri_RangeTableIndex is not set for partitions could potentially break extensions, too. The ResultRelInfos are visible to FDWs at least, and this patch required small changes to postgres_fdw. Nevertheless, this seem like the least bad option. I don't think these fields widely used in extensions; I don't think there are FDWs out there that uses the FDW "direct update" API, other than postgres_fdw. If there is, you will get a compilation error, so hopefully it is caught quickly. Backpatch to 11, where support for both cross-partition UPDATEs, and unique indexes on partitioned tables, were added. Reviewed-by: Amit Langote Security: CVE-2021-3393
2021-01-04Fix integer-overflow corner cases in substring() functions.Tom Lane
If the substring start index and length overflow when added together, substring() misbehaved, either throwing a bogus "negative substring length" error on a case that should succeed, or failing to complain that a negative length is negative (and instead returning the whole string, in most cases). Unsurprisingly, the text, bytea, and bit variants of the function all had this issue. Rearrange the logic to ensure that negative lengths are always rejected, and add an overflow check to handle the other case. Also install similar guards into detoast_attr_slice() (nee heap_tuple_untoast_attr_slice()), since it's far from clear that no other code paths leading to that function could pass it values that would overflow. Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim. Back-patch to v11. While these bugs are old, the common/int.h infrastructure for overflow-detecting arithmetic didn't exist before commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad enough to justify developing a standalone fix for the older branches. Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2020-05-15Rename assorted LWLock tranches.Tom Lane
Choose names that fit into the conventions for wait event names (particularly, that multi-word names are in the style MultiWordName) and hopefully convey more information to non-hacker users than the previous names did. Also rename SerializablePredicateLockListLock to SerializablePredicateListLock; the old name was long enough to cause table formatting problems, plus the double occurrence of "Lock" seems confusing/error-prone. Also change a couple of particularly opaque LWLock field names. Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-14Initial pgindent and pgperltidy run for v13.Tom Lane
Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
2020-03-30Implement operator class parametersAlexander Korotkov
PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-28Trigger autovacuum based on number of INSERTsDavid Rowley
Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-21During heap rebuild, lock any TOAST index until end of transaction.Noah Misch
swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
2020-03-16Introduce a maintenance_io_concurrency setting.Thomas Munro
Introduce a GUC and a tablespace option to control I/O prefetching, much like effective_io_concurrency, but for work that is done on behalf of many client sessions. Use the new setting in heapam.c instead of the hard-coded formula effective_io_concurrency + 10 introduced by commit 558a9165e08. Go with a default value of 10 for now, because it's a round number pretty close to the value used for that existing case. Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-04Introduce macros for typalign and typstorage constants.Tom Lane
Our usual practice for "poor man's enum" catalog columns is to define macros for the possible values and use those, not literal constants, in C code. But for some reason lost in the mists of time, this was never done for typalign/attalign or typstorage/attstorage. It's never too late to make it better though, so let's do that. The reason I got interested in this right now is the need to duplicate some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch. But in general, this sort of change aids greppability and readability, so it's a good idea even without any specific motivation. I may have missed a few places that could be converted, and it's even more likely that pending patches will re-introduce some hard-coded references. But that's not fatal --- there's no expectation that we'd actually change any of these values. We can clean up stragglers over time. Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-02-27Move src/backend/utils/hash/hashfn.c to src/commonRobert Haas
This also involves renaming src/include/utils/hashutils.h, which becomes src/include/common/hashfn.h. Perhaps an argument can be made for keeping the hashutils.h name, but it seemed more consistent to make it match the name of the file, and also more descriptive of what is actually going on here. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list advice on how not to break the Windows build from Davinder Singh and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-26Add deduplication to nbtree.Peter Geoghegan
Deduplication reduces the storage overhead of duplicates in indexes that use the standard nbtree index access method. The deduplication process is applied lazily, after the point where opportunistic deletion of LP_DEAD-marked index tuples occurs. Deduplication is only applied at the point where a leaf page split would otherwise be required. New posting list tuples are formed by merging together existing duplicate tuples. The physical representation of the items on an nbtree leaf page is made more space efficient by deduplication, but the logical contents of the page are not changed. Even unique indexes make use of deduplication as a way of controlling bloat from duplicates whose TIDs point to different versions of the same logical table row. The lazy approach taken by nbtree has significant advantages over a GIN style eager approach. Most individual inserts of index tuples have exactly the same overhead as before. The extra overhead of deduplication is amortized across insertions, just like the overhead of page splits. The key space of indexes works in the same way as it has since commit dd299df8 (the commit that made heap TID a tiebreaker column). Testing has shown that nbtree deduplication can generally make indexes with about 10 or 15 tuples for each distinct key value about 2.5X - 4X smaller, even with single column integer indexes (e.g., an index on a referencing column that accompanies a foreign key). The final size of single column nbtree indexes comes close to the final size of a similar contrib/btree_gin index, at least in cases where GIN's posting list compression isn't very effective. This can significantly improve transaction throughput, and significantly reduce the cost of vacuuming indexes. A new index storage parameter (deduplicate_items) controls the use of deduplication. The default setting is 'on', so all new B-Tree indexes automatically use deduplication where possible. This decision will be reviewed at the end of the Postgres 13 beta period. There is a regression of approximately 2% of transaction throughput with synthetic workloads that consist of append-only inserts into a table with several non-unique indexes, where all indexes have few or no repeated values. The underlying issue is that cycles are wasted on unsuccessful attempts at deduplicating items in non-unique indexes. There doesn't seem to be a way around it short of disabling deduplication entirely. Note that deduplication of items in unique indexes is fairly well targeted in general, which avoids the problem there (we can use a special heuristic to trigger deduplication passes in unique indexes, since we're specifically targeting "version bloat"). Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed. No bump in BTREE_VERSION, since the representation of posting list tuples works in a way that's backwards compatible with version 4 indexes (i.e. indexes built on PostgreSQL 12). However, users must still REINDEX a pg_upgrade'd index to use deduplication, regardless of the Postgres version they've upgraded from. This is the only way to set the new nbtree metapage flag indicating that deduplication is generally safe. Author: Anastasia Lubennikova, Peter Geoghegan Reviewed-By: Peter Geoghegan, Heikki Linnakangas Discussion: https://postgr.es/m/55E4051B.7020209@postgrespro.ru https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
2020-02-05Force tuple conversion when the source has missing attributes.Andrew Gierth
Tuple conversion incorrectly concluded that no conversion was needed as long as all the attributes lined up. But if the source tuple has a missing attribute (from addition of a column with default), then the destination tupdesc might not reflect the same default. The typical symptom was that the affected columns would be unexpectedly NULL. Repair by always forcing conversion if the source has missing attributes, which will be filled in by the deform operation. (In theory we could optimize for when the destination has the same default, but that seemed overkill.) Backpatch to 11 where missing attributes were added. Per bug #16242. Vik Fearing (discovery, code, testing) and me (analysis, testcase). Discussion: https://postgr.es/m/16242-d1c9fca28445966b@postgresql.org
2020-02-01Optimizations for integer to decimal output.Andrew Gierth
Using a lookup table of digit pairs reduces the number of divisions needed, and calculating the length upfront saves some work; these ideas are taken from the code previously committed for floats. David Fetter, reviewed by Kyotaro Horiguchi, Tels, and me. Discussion: https://postgr.es/m/20190924052620.GP31596%40fetter.org
2020-01-07tableam: New callback relation_fetch_toast_slice.Robert Haas
Instead of always calling heap_fetch_toast_slice during detoasting, invoke a table AM callback which, when the toast table is a heap table, will be heap_fetch_toast_slice. This makes it possible for a table AM other than heap to be used as a TOAST table. It also completes the series of commits intended to improve the interaction of tableam with TOAST that began with commit 8b94dab06617ef80a0901ab103ebd8754427ef5a; detoast.c is now, hopefully, fully AM-independent. Patch by me, reviewed by Andres Freund and Peter Eisentraut. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-12-27Revert "Rename files and headers related to index AM"Michael Paquier
This follows multiple complains from Peter Geoghegan, Andres Freund and Alvaro Herrera that this issue ought to be dug more before actually happening, if it happens. Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-25Rename files and headers related to index AMMichael Paquier
The following renaming is done so as source files related to index access methods are more consistent with table access methods (the original names used for index AMs ware too generic, and could be confused as including features related to table AMs): - amapi.h -> indexam.h. - amapi.c -> indexamapi.c. Here we have an equivalent with backend/access/table/tableamapi.c. - amvalidate.c -> indexamvalidate.c. - amvalidate.h -> indexamvalidate.h. - genam.c -> indexgenam.c. - genam.h -> indexgenam.h. This has been discussed during the development of v12 when table AM was worked on, but the renaming never happened. Author: Michael Paquier Reviewed-by: Fabien Coelho, Julien Rouhaud Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-18Move heap-specific detoasting logic into a separate function.Robert Haas
The new function, heap_fetch_toast_slice, is shared between toast_fetch_datum_slice and toast_fetch_datum, and does all the work of scanning the TOAST table, fetching chunks, and storing them into the space allocated for the result varlena. As an incidental side effect, this allows toast_fetch_datum_slice to perform the scan with only a single scankey if all chunks are being fetched, which might have some tiny performance benefit. Discussion: http://postgr.es/m/CA+TgmobBzxwFojJ0zV0Own3dr09y43hp+OzU2VW+nos4PMXWEg@mail.gmail.com
2019-12-18Fix compiler warning in non-assert buildsMichael Paquier
Oversight in commit e1551f9. Reported-by: Erik Rijkers Discussion: https://postgr.es/m/b7ad911d3eaa29af9fcdb9ccb26c363c@xs4all.nl
2019-12-18Refactor attribute mappings used in logical tuple conversionMichael Paquier
Tuple conversion support in tupconvert.c is able to convert rowtypes between two relations, inner and outer, which are logically equivalent but have a different ordering or even dropped columns (used mainly for inheritance tree and partitions). This makes use of attribute mappings, which are simple arrays made of AttrNumber elements with a length matching the number of attributes of the outer relation. The length of the attribute mapping has been treated as completely independent of the mapping itself until now, making it easy to pass down an incorrect mapping length. This commit refactors the code related to attribute mappings and moves it into an independent facility called attmap.c, extracted from tupconvert.c. This merges the attribute mapping with its length, avoiding to try to guess what is the length of a mapping to use as this is computed once, when the map is built. This will avoid mistakes like what has been fixed in dc816e58, which has used an incorrect mapping length by matching it with the number of attributes of an inner relation (a child partition) instead of an outer relation (a partitioned table). Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
2019-12-17Fix bad formula in previous commit.Robert Haas
Commit d5406dea25b600408e7acf17d5a06e82d3ce6d0d used a slightly novel, and wrong, approach to compute the length of the last toast chunk. It worked fine unless the last chunk happened to have the largest possible size.
2019-12-17Code cleanup for toast_fetch_datum and toast_fetch_datum_slice.Robert Haas
Rework some of the checks for bad TOAST chunks to be a bit simpler and easier to understand. These checks verify that (1) we get all and only the chunk numbers we expect to see and (2) each chunk has the expected size. However, the existing code was a bit hard to understand, at least for me; try to make it clearer. As part of that, have toast_fetch_datum_slice check the relationship between endchunk and totalchunks only with an Assert() rather than checking every chunk number against both values. There's no need to check that relationship in production builds because it's not a function of whether on-disk corruption is present; it's just a question of whether the code does the right math. Also, have toast_fetch_datum_slice() use ereport(ERROR) rather than elog(ERROR). Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd made the two functions inconsistent with each other. Rename assorted variables for better clarity and consistency, and move assorted variables from function scope to the function's main loop. Remove a few variables that are used only once entirely. Patch by me, reviewed by Peter Eisentraut. Discussion: http://postgr.es/m/CA+TgmobBzxwFojJ0zV0Own3dr09y43hp+OzU2VW+nos4PMXWEg@mail.gmail.com
2019-11-25Refactor reloption handling for index AMs in-coreMichael Paquier
This reworks the reloption parsing and build of a couple of index AMs by creating new structures for each index AM's options. This split was already done for BRIN, GIN and GiST (which actually has a fillfactor parameter), but not for hash, B-tree and SPGiST which relied on StdRdOptions due to an overlap with the default option set. This saves a couple of bytes for rd_options in each relcache entry with indexes making use of relation options, and brings more consistency between all index AMs. While on it, add a couple of AssertMacro() calls to make sure that utility macros to grab values of reloptions are used with the expected index AM. Author: Nikolay Shaplov Reviewed-by: Amit Langote, Michael Paquier, Álvaro Herrera, Dent John Discussion: https://postgr.es/m/4127670.gFlpRb6XCm@x200m
2019-11-16Properly determine length for on-disk TOAST valuesTomas Vondra
In detoast_attr_slice, VARSIZE_ANY was used to compute compressed length of on-disk TOAST values. That's incorrect, because the varlena value may be just a TOAST pointer, producing either bogus value or crashing. This is likely why the code was crashing on big-endian machines before 540f31680913 replaced the VARSIZE with VARSIZE_ANY, which however only masked the issue. Reported-by: Rushabh Lathia Discussion: https://postgr.es/m/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
2019-11-14Cleanup code in reloptions.h regarding reloption handlingMichael Paquier
reloptions.h includes since ba748f7 a set of macros to handle reloption types in a way similar to how parseRelOptions() works. They have never been used in the core code, and we have more simple methods now to parse and fill in rd_options for a given relation depending on its relkind, so remove this interface to simplify things. Per discussion between Amit Langote, Álvaro Herrera and me. Discussion: https://postgr.es/m/CA+HiwqE6zbNO92az6pp5GiTw4tr-9rfCE0t84whQSP+YwSKjMQ@mail.gmail.com
2019-11-14Split handling of reloptions for partitioned tablesMichael Paquier
Partitioned tables do not have relation options yet, but, similarly to what's done for views which have their own parsing table, it could make sense to introduce new parameters for some of the existing default ones like fillfactor, autovacuum, etc. Splitting things has the advantage to make the information stored in rd_options include only the necessary information, reducing the amount of memory used for a relcache entry with partitioned tables if new reloptions are introduced at this level. Author: Nikolay Shaplov Reviewed-by: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/1627387.Qykg9O6zpu@x200m
2019-11-05Split all OBJS style lines in makefiles into one-line-per-entry style.Andres Freund
When maintaining or merging patches, one of the most common sources for conflicts are the list of objects in makefiles. Especially when the split across lines has been changed on both sides, which is somewhat common due to attempting to stay below 80 columns, those conflicts are unnecessarily laborious to resolve. By splitting, and alphabetically sorting, OBJS style lines into one object per line, conflicts should be less frequent, and easier to resolve when they still occur. Author: Andres Freund Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05Refactor code building relation optionsMichael Paquier
Historically, the code to build relation options has been shaped the same way in multiple code paths by using a set of datums in input with the options parsed with a static table which is then filled with the option values. This introduces a new common routine in reloptions.c to do most of the legwork for the in-core code paths. Author: Amit Langote Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CA+HiwqGsoSn_uTPPYT19WrtR7oYpYtv4CdS0xuedTKiHHWuk_g@mail.gmail.com
2019-10-30Fix typos in the codeMichael Paquier
Author: Vignesh C Reviewed-by: Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/CALDaNm0ni+GAOe4+fbXiOxNrVudajMYmhJFtXGX-zBPoN8ixhw@mail.gmail.com
2019-10-04Rename some toasting functions based on whether they are heap-specific.Robert Haas
The old names for the attribute-detoasting functions names included the word "heap," which seems outdated now that the heap is only one of potentially many table access methods. On the other hand, toast_insert_or_update and toast_delete are heap-specific, so rename them by adding "heap_" as a prefix. Not all of the work of making the TOAST system fully accessible to AMs other than the heap is done yet, but there seems to be little harm in getting this renaming out of the way now. Commit 8b94dab06617ef80a0901ab103ebd8754427ef5a already divided up the functions among various files partially according to whether it was intended that they should be heap-specific or AM-agnostic, so this is just clarifying the division contemplated by that commit. Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro, Andres Freund, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-10-01Optimize partial TOAST decompressionTomas Vondra
Commit 4d0e994eed added support for partial TOAST decompression, so the decompression is interrupted after producing the requested prefix. For prefix and slices near the beginning of the entry, this may saves a lot of decompression work. That however only deals with decompression - the whole compressed entry was still fetched and re-assembled, even though the compression used only a small fraction of it. This commit improves that by computing how much compressed data may be needed to decompress the requested prefix, and then fetches only the necessary part. We always need to fetch a bit more compressed data than the requested (uncompressed) prefix, because the prefix may not be compressible at all and pglz itself adds a bit of overhead. That means this optimization is most effective when the requested prefix is much smaller than the whole compressed entry. Author: Binguo Bao Reviewed-by: Andrey Borodin, Tomas Vondra, Paul Ramsey Discussion: https://www.postgresql.org/message-id/flat/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
2019-09-27Fix lockmode initialization for custom relation optionsMichael Paquier
The code was enforcing AccessExclusiveLock for all custom relation options, which is incorrect as the APIs allow a custom lock level to be set. While on it, fix a couple of inconsistencies in the tests and the README of dummy_index_am. Oversights in commit 773df88. Discussion: https://postgr.es/m/20190925234152.GA2115@paquier.xyz
2019-09-25Support reloptions of enum typeAlvaro Herrera
All our current in core relation options of type string (not many, admittedly) behave in reality like enums. But after seeing an implementation for enum reloptions, it's clear that strings are messier, so introduce the new reloption type. Switch all string options to be enums instead. Fortunately we have a recently introduced test module for reloptions, so we don't lose coverage of string reloptions, which may still be used by third-party modules. Authors: Nikolay Shaplov, Álvaro Herrera Reviewed-by: Nikita Glukhov, Aleksandr Parfenov Discussion: https://postgr.es/m/43332102.S2V5pIjXRx@x200m
2019-09-25Allow definition of lock mode for custom reloptionsMichael Paquier
Relation options can define a lock mode other than AccessExclusiveMode since 47167b7, but modules defining custom relation options did not really have a way to enforce that. Correct that by extending the current API set so as modules can define a custom lock mode. Author: Michael Paquier Reviewed-by: Kuntal Ghosh Discussion: https://postgr.es/m/20190920013831.GD1844@paquier.xyz
2019-09-25Fix failure with lock mode used for custom relation optionsMichael Paquier
In-core relation options can use a custom lock mode since 47167b7, that has lowered the lock available for some autovacuum parameters. However it forgot to consider custom relation options. This causes failures with ALTER TABLE SET when changing a custom relation option, as its lock is not defined. The existing APIs to define a custom reloption does not allow to define a custom lock mode, so enforce its initialization to AccessExclusiveMode which should be safe enough in all cases. An upcoming patch will extend the existing APIs to allow a custom lock mode to be defined. The problem can be reproduced with bloom indexes, so add a test there. Reported-by: Nikolay Sharplov Analyzed-by: Thomas Munro, Michael Paquier Author: Michael Paquier Reviewed-by: Kuntal Ghosh Discussion: https://postgr.es/m/20190920013831.GD1844@paquier.xyz Backpatch-through: 9.6
2019-09-05Split tuptoaster.c into three separate files.Robert Haas
detoast.c/h contain functions required to detoast a datum, partially or completely, plus a few other utility functions for examining the size of toasted datums. toast_internals.c/h contain functions that are used internally to the TOAST subsystem but which (mostly) do not need to be accessed from outside. heaptoast.c/h contains code that is intrinsically specific to the heap AM, either because it operates on HeapTuples or is based on the layout of a heap page. detoast.c and toast_internals.c are placed in src/backend/access/common rather than src/backend/access/heap. At present, both files still have dependencies on the heap, but that will be improved in a future commit. Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro, Andres Freund, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-03Remove 'msg' parameter from convert_tuples_by_nameAlvaro Herrera
The message was included as a parameter when this function was added in dcb2bda9b704, but I don't think it has ever served any useful purpose. Let's stop spreading it pointlessly. Reviewed by Amit Langote and Peter Eisentraut. Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
2019-08-16Remove fmgr.h includes from headers that don't really need it.Andres Freund
Most of the fmgr.h includes were obsoleted by 352a24a1f9d6f7d4abb1. A few others can be obsoleted using the underlying struct type in an implementation detail. Author: Andres Freund Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-07-29Fix inconsistencies and typos in the treeMichael Paquier
This is numbered take 8, and addresses again a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-22Fix inconsistencies and typos in the treeMichael Paquier
This is numbered take 7, and addresses a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-15Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane
Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-06-25Add toast-level reloption for vacuum_index_cleanupMichael Paquier
a96c41f has introduced the option for heap, but it still lacked the variant to control the behavior for toast relations. While on it, refactor the tests so as they stress more scenarios with the various values that vacuum_index_cleanup can use. It would be useful to couple those tests with pageinspect to check that pages are actually cleaned up, but this is left for later. Author: Masahiko Sawada, Michael Paquier Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAD21AoCqs8iN04RX=i1KtLSaX5RrTEM04b7NHYps4+rqtpWNEg@mail.gmail.com
2019-06-17Fix more typos and inconsistencies in the treeMichael Paquier
Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-08Fix assorted inconsistencies.Amit Kapila
There were a number of issues in the recent commits which include typos, code and comments mismatch, leftover function declarations. Fix them. Reported-by: Alexander Lakhin Author: Alexander Lakhin, Amit Kapila and Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com