summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2022-01-27Change collate and ctype fields to type textPeter Eisentraut
This changes the data type of the catalog fields datcollate, datctype, collcollate, and collctype from name to text. There wasn't ever a really good reason for them to be of type name; presumably this was just carried over from when they were fixed-size fields in pg_control, first into the corresponding pg_database fields, and then to pg_collation. The values are not identifiers or object names, and we don't ever look them up that way. Changing to type text saves space in the typical case, since locale names are typically only a few bytes long. But it is also possible that an ICU locale name with several customization options appended could be longer than 63 bytes, so this also enables that case, which was previously probably broken. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-20Call pg_newlocale_from_collation() also with default collationPeter Eisentraut
Previously, callers of pg_newlocale_from_collation() did not call it if the collation was DEFAULT_COLLATION_OID and instead proceeded with a pg_locale_t of 0. Instead, now we call it anyway and have it return 0 if the default collation was passed. It already did this, so we just have to adjust the callers. This simplifies all the call sites and also makes future enhancements easier. After discussion and testing, the previous comment in pg_locale.c about avoiding this for performance reasons may have been mistaken since it was testing a very different patch version way back when. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
2022-01-17pg_upgrade: Preserve relfilenodes and tablespace OIDs.Robert Haas
Currently, database OIDs, relfilenodes, and tablespace OIDs can all change when a cluster is upgraded using pg_upgrade. It seems better to preserve them, because (1) it makes troubleshooting pg_upgrade easier, since you don't have to do a lot of work to match up files in the old and new clusters, (2) it allows 'rsync' to save bandwidth when used to re-sync a cluster after an upgrade, and (3) if we ever encrypt or sign blocks, we would likely want to use a nonce that depends on these values. This patch only arranges to preserve relfilenodes and tablespace OIDs. The task of preserving database OIDs is left for another patch, since it involves some complexities that don't exist in these cases. Database OIDs have a similar issue, but there are some tricky points in that case that do not apply to these cases, so that problem is left for another patch. Shruthi KC, based on an earlier patch from Antonin Houska, reviewed and with some adjustments by me. Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
2022-01-17Introduce log_destination=jsonlogMichael Paquier
"jsonlog" is a new value that can be added to log_destination to provide logs in the JSON format, with its output written to a file, making it the third type of destination of this kind, after "stderr" and "csvlog". The format is convenient to feed logs to other applications. There is also a plugin external to core that provided this feature using the hook in elog.c, but this had to overwrite the output of "stderr" to work, so being able to do both at the same time was not possible. The files generated by this log format are suffixed with ".json", and use the same rotation policies as the other two formats depending on the backend configuration. This takes advantage of the refactoring work done previously in ac7c807, bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the TAP tests, making the addition of any new file-based format rather straight-forward. The documentation is updated to list all the keys and the values that can exist in this new format. pg_current_logfile() also required a refresh for the new option. Author: Sehrope Sarkuni, Michael Paquier Reviewed-by: Nathan Bossart, Justin Pryzby Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2022-01-16Add stxdinherit flag to pg_statistic_ext_dataTomas Vondra
Add pg_statistic_ext_data.stxdinherit flag, so that for each extended statistics definition we can store two versions of data - one for the relation alone, one for the whole inheritance tree. This is analogous to pg_statistic.stainherit, but we failed to include such flag in catalogs for extended statistics, and we had to work around it (see commits 859b3003de, 36c4bc6e72 and 20b9fa308e). This changes the relationship between the two catalogs storing extended statistics objects (pg_statistic_ext and pg_statistic_ext_data). Until now, there was a simple 1:1 mapping - for each definition there was one pg_statistic_ext_data row, and this row was inserted while creating the statistics (and then updated during ANALYZE). With the stxdinherit flag, we don't know how many rows there will be (child relations may be added after the statistics object is defined), so there may be up to two rows. We could make CREATE STATISTICS to always create both rows, but that seems wasteful - without partitioning we only need stxdinherit=false rows, and declaratively partitioned tables need only stxdinherit=true. So we no longer initialize pg_statistic_ext_data in CREATE STATISTICS, and instead make that a responsibility of ANALYZE. Which is what we do for regular statistics too. Patch by me, with extensive improvements and fixes by Justin Pryzby. Author: Tomas Vondra, Justin Pryzby Reviewed-by: Tomas Vondra, Justin Pryzby Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15Build inherited extended stats on partitioned tablesTomas Vondra
Commit 859b3003de disabled building of extended stats for inheritance trees, to prevent updating the same catalog row twice. While that resolved the issue, it also means there are no extended stats for declaratively partitioned tables, because there are no data in the non-leaf relations. That also means declaratively partitioned tables were not affected by the issue 859b3003de addressed, which means this is a regression affecting queries that calculate estimates for the whole inheritance tree as a whole (which includes e.g. GROUP BY queries). But because partitioned tables are empty, we can invert the condition and build statistics only for the case with inheritance, without losing anything. And we can consider them when calculating estimates. It may be necessary to run ANALYZE on partitioned tables, to collect proper statistics. For declarative partitioning there should no prior statistics, and it might take time before autoanalyze is triggered. For tables partitioned by inheritance the statistics may include data from child relations (if built 859b3003de), contradicting the current code. Report and patch by Justin Pryzby, minor fixes and cleanup by me. Backpatch all the way back to PostgreSQL 10, where extended statistics were introduced (same as 859b3003de). Author: Justin Pryzby Reported-by: Justin Pryzby Backpatch-through: 10 Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-15Ignore extended statistics for inheritance treesTomas Vondra
Since commit 859b3003de we only build extended statistics for individual relations, ignoring the child relations. This resolved the issue with updating catalog tuple twice, but we still tried to use the statistics when calculating estimates for the whole inheritance tree. When the relations contain very distinct data, it may produce bogus estimates. This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we fix it the same way - by ignoring extended statistics when calculating estimates for the inheritance tree as a whole. We still consider extended statistics when calculating estimates for individual child relations, of course. This may result in plan changes due to different estimates, but if the old statistics were not describing the inheritance tree particularly well it's quite likely the new plans is actually better. Report and patch by Justin Pryzby, minor fixes and cleanup by me. Backpatch all the way back to PostgreSQL 10, where extended statistics were introduced (same as 859b3003de). Author: Justin Pryzby Reported-by: Justin Pryzby Backpatch-through: 10 Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-14Rename value node fieldsPeter Eisentraut
For the formerly-Value node types, rename the "val" field to a name specific to the node type, namely "ival", "fval", "sval", and "bsval". This makes some code clearer and catches mixups better. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8c1a2e37-c68d-703c-5a83-7a6077f4f997@enterprisedb.com
2022-01-13Fix ruleutils.c's dumping of whole-row Vars in more contexts.Tom Lane
Commit 7745bc352 intended to ensure that whole-row Vars would be printed with "::type" decoration in all contexts where plain "var.*" notation would result in star-expansion, notably in ROW() and VALUES() constructs. However, it missed the case of INSERT with a single-row VALUES, as reported by Timur Khanjanov. Nosing around ruleutils.c, I found a second oversight: the code for RowCompareExpr generates ROW() notation without benefit of an actual RowExpr, and naturally it wasn't in sync :-(. (The code for FieldStore also does this, but we don't expect that to generate strictly parsable SQL anyway, so I left it alone.) Back-patch to all supported branches. Discussion: https://postgr.es/m/efaba6f9-4190-56be-8ff2-7a1674f9194f@intrans.baku.az
2022-01-11Enhance pg_log_backend_memory_contexts() for auxiliary processes.Fujii Masao
Previously pg_log_backend_memory_contexts() could request to log the memory contexts of backends, but not of auxiliary processes such as checkpointer. This commit enhances the function so that it can also send the request to auxiliary processes. It's useful to look at the memory contexts of those processes for debugging purpose and better understanding of the memory usage pattern of them. Note that pg_log_backend_memory_contexts() cannot send the request to logger or statistics collector. Because this logging request mechanism is based on shared memory but those processes aren't connected to that. Author: Bharath Rupireddy Reviewed-by: Vignesh C, Kyotaro Horiguchi, Fujii Masao Discussion: https://postgr.es/m/CALj2ACU1nBzpacOK2q=a65S_4+Oaz_rLTsU1Ri0gf7YUmnmhfQ@mail.gmail.com
2022-01-11Improve error handling of cryptohash computationsMichael Paquier
The existing cryptohash facility was causing problems in some code paths related to MD5 (frontend and backend) that relied on the fact that the only type of error that could happen would be an OOM, as the MD5 implementation used in PostgreSQL ~13 (the in-core implementation is used when compiling with or without OpenSSL in those older versions), could fail only under this circumstance. The new cryptohash facilities can fail for reasons other than OOMs, like attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to 1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this would cause incorrect reports to show up. This commit extends the cryptohash APIs so as callers of those routines can fetch more context when an error happens, by using a new routine called pg_cryptohash_error(). The error states are stored within each implementation's internal context data, so as it is possible to extend the logic depending on what's suited for an implementation. The default implementation requires few error states, but OpenSSL could report various issues depending on its internal state so more is needed in cryptohash_openssl.c, and the code is shaped so as we are always able to grab the necessary information. The core code is changed to adapt to the new error routine, painting more "const" across the call stack where the static errors are stored, particularly in authentication code paths on variables that provide log details. This way, any future changes would warn if attempting to free these strings. The MD5 authentication code was also a bit blurry about the handling of "logdetail" (LOG sent to the postmaster), so improve the comments related that, while on it. The origin of the problem is 87ae969, that introduced the centralized cryptohash facility. Extra changes are done for pgcrypto in v14 for the non-OpenSSL code path to cope with the improvements done by this commit. Reported-by: Michael Mühlbeyer Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com Backpatch-through: 14
2022-01-09Make pg_get_expr() more bulletproof.Tom Lane
Since this function is defined to accept pg_node_tree values, it could get applied to any nodetree that can appear in a cataloged pg_node_tree column. Some such cases can't be supported --- for example, its API doesn't allow providing referents for more than one relation --- but we should try to throw a user-facing error rather than an internal error when encountering such a case. In support of this, extend expression_tree_walker/mutator to be sure they'll work on any such node tree (which basically means adding support for relpartbound node types). That allows us to run pull_varnos and check for the case of multiple relations before we start processing the tree. The alternative of changing the low-level error thrown for an out-of-range varno isn't appealing, because that could mask actual bugs in other usages of ruleutils. Per report from Justin Pryzby. This is basically cosmetic, so no back-patch. Discussion: https://postgr.es/m/20211219205422.GT17618@telsasoft.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2022-01-03Clean up error messages related to bad datetime units.Tom Lane
Adjust the error texts used for unrecognized/unsupported datetime units so that there are just two strings to translate, not two per datatype. Along the way, follow our usual error message style of not double-quoting type names, and instead making sure that we say the name is a type. Fix a couple of places in date.c that were using the wrong one of "unrecognized" and "unsupported". Nikhil Benesch, with a bit more editing by me Discussion: https://postgr.es/m/CAPWqQZTURGixmbMH2_Z3ZtWGA0ANjUb9bwtkkxSxSfDeFHuM6Q@mail.gmail.com
2021-12-17Simplify the general-purpose 64-bit integer parsing APIsPeter Eisentraut
pg_strtouint64() is a wrapper around strtoull/strtoul/_strtoui64, but it seems no longer necessary to have this indirection. msvc/Solution.pm claims HAVE_STRTOULL, so the "MSVC only" part seems unnecessary. Also, we have code in c.h to substitute alternatives for strtoull() if not found, and that would appear to cover all currently supported platforms, so having a further fallback in pg_strtouint64() seems unnecessary. Therefore, we could remove pg_strtouint64(), and use strtoull() directly in all call sites. However, it seems useful to keep a separate notation for parsing exactly 64-bit integers, matching the type definition int64/uint64. For that, add new macros strtoi64() and strtou64() in c.h as thin wrappers around strtol()/strtoul() or strtoll()/stroull(). This makes these functions available everywhere instead of just in the server code, and it makes the function naming notably different from the pg_strtointNN() functions in numutils.c, which have a different API. Discussion: https://www.postgresql.org/message-id/flat/a3df47c9-b1b4-29f2-7e91-427baf8b75a3%40enterprisedb.com
2021-12-15Always use ReleaseTupleDesc after lookup_rowtype_tupdesc et al.Tom Lane
The API spec for lookup_rowtype_tupdesc previously said you could use either ReleaseTupleDesc or DecrTupleDescRefCount. However, the latter choice means the caller must be certain that the returned tupdesc is refcounted. I don't recall right now whether that was always true when this spec was written, but it's certainly not always true since we introduced shared record typcaches for parallel workers. That means that callers using DecrTupleDescRefCount are dependent on typcache behavior details that they probably shouldn't be. Hence, change the API spec to say that you must call ReleaseTupleDesc, and fix the half-dozen callers that weren't. AFAICT this is just future-proofing, there's no live bug here. So no back-patch. Per gripe from Chapman Flack. Discussion: https://postgr.es/m/61B901A4.1050808@anastigmatix.net
2021-12-13Remove unimplemented/undocumented geometric functions & operators.Tom Lane
Nobody has filled in these stubs for upwards of twenty years, so it's time to drop the idea that they might get implemented any day now. The associated pg_operator and pg_proc entries are just confusing wastes of space. Per complaint from Anton Voloshin. Discussion: https://postgr.es/m/3426566.1638832718@sss.pgh.pa.us
2021-12-13Implement poly_distance().Tom Lane
geo_ops.c contains half a dozen functions that are just stubs throwing ERRCODE_FEATURE_NOT_SUPPORTED. Since it's been like that for more than twenty years, there's clearly not a lot of interest in filling in the stubs. However, I'm uncomfortable with deleting poly_distance(), since every other geometric type supports a distance-to-another-object- of-the-same-type function. We can easily add this capability by cribbing from poly_overlap() and path_distance(). It's possible that the (existing) test case for this will show some numeric instability, but hopefully the buildfarm will expose it if so. In passing, improve the documentation to try to explain why polygons are distinct from closed paths in the first place. Discussion: https://postgr.es/m/3426566.1638832718@sss.pgh.pa.us
2021-12-13Fix alignment in multirange_get_range() functionAlexander Korotkov
The multirange_get_range() function fails when two boundaries of the same range have different alignments. Fix that by adding proper pointer alignment. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/17300-dced2d01ddeb1f2f%40postgresql.org Backpatch-through: 14
2021-12-08Allow specifying column list for foreign key ON DELETE SET actionsPeter Eisentraut
Extend the foreign key ON DELETE actions SET NULL and SET DEFAULT by allowing the specification of a column list, like CREATE TABLE posts ( ... FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL (author_id) ); If a column list is specified, only those columns are set to null/default, instead of all the columns in the foreign-key constraint. This is useful for multitenant or sharded schemas, where the tenant or shard ID is included in the primary key of all tables but shouldn't be set to null. Author: Paul Martinez <paulmtz@google.com> Discussion: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV=njbSMxf+rbDHpx=W=B7AEaMKn8dWn9OZJY7w@mail.gmail.com
2021-12-06Fix inappropriate uses of PG_GETARG_UINT32()Peter Eisentraut
The chr() function used PG_GETARG_UINT32() even though the argument is declared as (signed) integer. As a result, you can pass negative arguments to this function and it internally interprets them as positive. Ultimately ends up being harmless, but it seems wrong, so fix this and rearrange the internal error checking a bit to accommodate this. Another case was in the documentation, where example code used PG_GETARG_UINT32() with an argument declared as signed integer. Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://www.postgresql.org/message-id/flat/7e43869b-d412-8f81-30a3-809783edc9a3%40enterprisedb.com
2021-12-03Some RELKIND macro refactoringPeter Eisentraut
Add more macros to group some RELKIND_* macros: - RELKIND_HAS_PARTITIONS() - RELKIND_HAS_TABLESPACE() - RELKIND_HAS_TABLE_AM() Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/a574c8f1-9c84-93ad-a9e5-65233d6fc00f%40enterprisedb.com
2021-12-01Remove unused includesPeter Eisentraut
These haven't been needed for a long time. Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2021-11-30Add a view to show the stats of subscription workers.Amit Kapila
This commit adds a new system view pg_stat_subscription_workers, that shows information about any errors which occur during the application of logical replication changes as well as during performing initial table synchronization. The subscription statistics entries are removed when the corresponding subscription is removed. It also adds an SQL function pg_stat_reset_subscription_worker() to reset single subscription errors. The contents of this view can be used by an upcoming patch that skips the particular transaction that conflicts with the existing data on the subscriber. This view can be extended in the future to track other xact related statistics like the number of xacts committed/aborted for subscription workers. Author: Masahiko Sawada Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-11-30Fix typosMichael Paquier
Author: Lingjie Qiang Discussion: https://postgr.es/m/OSAPR01MB71654E773F62AC88DC1FC8CC80669@OSAPR01MB7165.jpnprd01.prod.outlook.com
2021-11-28Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.Tom Lane
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128** is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-24Allow Memoize to operate in binary comparison modeDavid Rowley
Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added
2021-11-23Add SQL functions to monitor the directory contents of replication slotsMichael Paquier
This commit adds a set of functions able to look at the contents of various paths related to replication slots: - pg_ls_logicalsnapdir, for pg_logical/snapshots/ - pg_ls_logicalmapdir, for pg_logical/mappings/ - pg_ls_replslotdir, for pg_replslot/<slot_name>/ These are intended to be used by monitoring tools. Unlike pg_ls_dir(), execution permission can be granted to non-superusers. Roles members of pg_monitor gain have access to those functions. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Justin Pryzby Discussion: https://postgr.es/m/CALj2ACWsfizZjMN6bzzdxOk1ADQQeSw8HhEjhmVXn_Pu+7VzLw@mail.gmail.com
2021-11-17Add a planner support function for starts_with().Tom Lane
This fills in some gaps in planner support for starts_with() and the equivalent ^@ operator: * A condition such as "textcol ^@ constant" can now use a regular btree index, not only an SP-GiST index, so long as the index's collation is C. (This works just like "textcol LIKE 'foo%'".) * "starts_with(textcol, constant)" can be optimized the same as "textcol ^@ constant". * Fixed-prefix LIKE and regex patterns are now more like starts_with() in another way: if you apply one to an SPGiST-indexed column, you'll get an index condition using ^@ rather than two index conditions with >= and <. Per a complaint from Shay Rojansky. Patch by me; thanks to Nathan Bossart for review. Discussion: https://postgr.es/m/232599.1633800229@sss.pgh.pa.us
2021-11-17Fix display of SQL-standard function's arguments in INSERT/SELECT.Tom Lane
If a SQL-standard function body contains an INSERT ... SELECT statement, any function parameters referenced within the SELECT were always printed in $N style, rather than using the parameter name if any. While not strictly incorrect, this wasn't the intention, and it's inconsistent with the way that such parameters would be printed in any other kind of statement. The cause is that the recursion to get_query_def from get_insert_query_def neglected to pass down the context->namespaces list, passing constant NIL instead. This is a very ancient oversight, but AFAICT it had no visible consequences before commit e717a9a18 added an outermost namespace with function parameters. We don't allow INSERT ... SELECT as a sub-query, except in a top-level WITH clause, where it couldn't contain any outer references that might need to access upper namespaces. So although that's arguably a bug, I don't see any point in changing it before v14. In passing, harden the code added to get_parameter by e717a9a18 so that it won't crash if a PARAM_EXTERN Param appears in an unexpected place. Per report from Erki Eessaar. Code fix by me, regression test case by Masahiko Sawada. Discussion: https://postgr.es/m/AM9PR01MB8268347BED344848555167FAFE949@AM9PR01MB8268.eurprd01.prod.exchangelabs.com
2021-11-12Fix memory overrun when querying pg_stat_slruMichael Paquier
pg_stat_get_slru() in pgstatfuncs.c would point to one element after the end of the array PgStat_SLRUStats when finishing to scan its entries. This had no direct consequences as no data from the extra memory area was read, but static analyzers would rightfully complain here. So let's be clean. While on it, this adds one regression test in the area reserved for system views. Reported-by: Alexander Kozhemyakin, via AddressSanitizer Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/17280-37da556e86032070@postgresql.org Backpatch-through: 13
2021-11-06Disallow making an empty lexeme via array_to_tsvector().Tom Lane
The tsvector data type has always forbidden lexemes to be empty. However, array_to_tsvector() didn't get that memo, and would allow an empty-string array element to become an empty lexeme. This could result in dump/restore failures later, not to mention whatever semantic issues might be behind the original prohibition. However, other functions that take a plain text input directly as a lexeme value do not need a similar restriction, because they only match the string against existing tsvector entries. In particular it'd be a bad idea to make ts_delete() reject empty strings, since that is the most convenient way to clean up any bad data that might have gotten into a tsvector column via this bug. Reflecting on that, let's also remove the prohibition against NULL array elements in tsvector_delete_arr and tsvector_setweight_by_filter. It seems more consistent to ignore them, as an empty-string element would be ignored. There's a case for back-patching this, since it's clearly a bug fix. On balance though, it doesn't seem like something to change in a minor release. Jean-Christophe Arnu Discussion: https://postgr.es/m/CAHZmTm1YVndPgUVRoag2WL0w900XcoiivDDj-gTTYBsG25c65A@mail.gmail.com
2021-10-26Allow GRANT on pg_log_backend_memory_contexts().Jeff Davis
Remove superuser check, allowing any user granted permissions on pg_log_backend_memory_contexts() to log the memory contexts of any backend. Note that this could allow a privileged non-superuser to log the memory contexts of a superuser backend, but as discussed, that does not seem to be a problem. Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com
2021-10-11Fix EXPLAIN of SEARCH BREADTH FIRST queries some more.Tom Lane
Commit 3f50b8263 had an oversight: formerly, to deparse expressions attached to a plan node, it was only necessary to update the deparse_namespace ancestors list alongside calling set_deparse_plan. Now it's necessary to update the ancestors list *first*, because set_deparse_plan consults it, and one call site got that wrong. This error was masked in most cases because explain.c uses just one List object for the ancestors list, updating it in-place as the plan is scanned, so that we accidentally had the right List assigned to dpns->ancestors before it was needed. It would fail only if a WorkTableScan node were the first one that we tried to deparse a subexpression of. Per report from Markus Winand. Like the previous patch, back-patch to v14. Discussion: https://postgr.es/m/648B0505-AA57-42C2-A2DA-E551DE46FA15@winand.at
2021-10-06Fix corner-case loss of precision in numeric_power().Dean Rasheed
This fixes a loss of precision that occurs when the first input is very close to 1, so that its logarithm is very small. Formerly, during the initial low-precision calculation to estimate the result weight, the logarithm was computed to a local rscale that was capped to NUMERIC_MAX_DISPLAY_SCALE (1000). However, the base may be as close as 1e-16383 to 1, hence its logarithm may be as small as 1e-16383, and so the local rscale needs to be allowed to exceed 16383, otherwise all precision is lost, leading to a poor choice of rscale for the full-precision calculation. Fix this by removing the cap on the local rscale during the initial low-precision calculation, as we already do in the full-precision calculation. This doesn't change the fact that the initial calculation is a low-precision approximation, computing the logarithm to around 8 significant digits, which is very fast, especially when the base is very close to 1. Patch by me, reviewed by Alvaro Herrera. Discussion: https://postgr.es/m/CAEZATCV-Ceu%2BHpRMf416yUe4KKFv%3DtdgXQAe5-7S9tD%3D5E-T1g%40mail.gmail.com
2021-10-01Avoid believing incomplete MCV-only stats in get_variable_range().Tom Lane
get_variable_range() would incautiously believe that statistics containing only an MCV list are sufficient to derive a range estimate. That's okay for an enum-like column that contains only MCVs, but otherwise the estimate could be pretty bad. Make it report that the range is indeterminate unless the MCVs plus nullfrac account for the whole table. I don't think this needs a dedicated test case, since a quick code coverage check verifies that the existing regression tests traverse all the alternatives. There is room to doubt that a future-proof test case could be built anyway, given that the submitted example accidentally doesn't fail before v11. Per bug #17207 from Simon Perepelitsa. Back-patch to v10. In principle this has been broken all along, but I'm hesitant to make such changes in 9.6, since if anyone is unhappy with 9.6.24's behavior there will be no second chance to fix it. Discussion: https://postgr.es/m/17207-5265aefa79e333b4@postgresql.org
2021-09-29Clarify use of "statistics objects" in the codeMichael Paquier
The code inconsistently used "statistic object" or "statistics" where the correct term, as discussed, is actually "statistics object". This improves the state of the code to be more consistent. While on it, fix an incorrect error message introduced in a4d75c8. This error should never happen, as the code states, but it would be misleading. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com Backpatch-through: 14
2021-09-26Avoid unnecessary division in interval_cmp_value().Tom Lane
Splitting the time field into days and microseconds is pretty useless when we're just going to recombine those values. It's unclear if anyone will notice the speedup in real-world cases, but a cycle shaved is a cycle earned. Discussion: https://postgr.es/m/2629129.1632675713@sss.pgh.pa.us
2021-09-22Make use of PG_INT64_MAX/PG_INT64_MINPeter Eisentraut
This code was written before those symbols were introduced, but now we can simplify it.
2021-09-16Fix EXPLAIN to handle SEARCH BREADTH FIRST queries.Tom Lane
The rewriter transformation for SEARCH BREADTH FIRST produces a FieldSelect on a Var of type RECORD, where the Var references the recursive union's worktable output. EXPLAIN VERBOSE failed to handle this case, because it only expected such Vars to appear in CteScans not WorkTableScans. Fix that, and add some test cases exercising EXPLAIN on SEARCH and CYCLE queries. In principle this oversight is an old bug, but it seems that the case is unreachable without SEARCH BREADTH FIRST, because the parser fails when attempting to create such a reference manually. So for today I'll just patch HEAD/v14. Someday we might find that the code portion of this patch needs to be back-patched further. Per report from Atsushi Torikoshi. Discussion: https://postgr.es/m/5bafa66ad529e11860339565c9e7c166@oss.nttdata.com
2021-09-16Message style improvementsPeter Eisentraut
2021-09-15Remove arbitrary 64K-or-so limit on rangetable size.Tom Lane
Up to now the size of a query's rangetable has been limited by the constants INNER_VAR et al, which mustn't be equal to any real rangetable index. 65000 doubtless seemed like enough for anybody, and it still is orders of magnitude larger than the number of joins we can realistically handle. However, we need a rangetable entry for each child partition that is (or might be) processed by a query. Queries with a few thousand partitions are getting more realistic, so that the day when that limit becomes a problem is in sight, even if it's not here yet. Hence, let's raise the limit. Rather than just increase the values of INNER_VAR et al, this patch adopts the approach of making them small negative values, so that rangetables could theoretically become as long as INT_MAX. The bulk of the patch is concerned with changing Var.varno and some related variables from "Index" (unsigned int) to plain "int". This is basically cosmetic, with little actual effect other than to help debuggers print their values nicely. As such, I've only bothered with changing places that could actually see INNER_VAR et al, which the parser and most of the planner don't. We do have to be careful in places that are performing less/greater comparisons on varnos, but there are very few such places, other than the IS_SPECIAL_VARNO macro itself. A notable side effect of this patch is that while it used to be possible to add INNER_VAR et al to a Bitmapset, that will now draw an error. I don't see any likelihood that it wouldn't be a bug to include these fake varnos in a bitmapset of real varnos, so I think this is all to the good. Although this touches outfuncs/readfuncs, I don't think a catversion bump is required, since stored rules would never contain Vars with these fake varnos. Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
2021-09-15Fix hash_arrayPeter Eisentraut
Commit a3d2b1bbe904b0ca8d9fdde20f25295ff3e21f79 neglected to initialize the type_id field of the synthesized type cache entry, so it would make a new one on every call. Also, better use the per-function memory context for this; otherwise it leaks memory. Discussion: https://www.postgresql.org/message-id/flat/17158-8a2ba823982537a4%40postgresql.org
2021-09-09Remove Value node structPeter Eisentraut
The Value node struct is a weird construct. It is its own node type, but most of the time, it actually has a node type of Integer, Float, String, or BitString. As a consequence, the struct name and the node type don't match most of the time, and so it has to be treated specially a lot. There doesn't seem to be any value in the special construct. There is very little code that wants to accept all Value variants but nothing else (and even if it did, this doesn't provide any convenient way to check it), and most code wants either just one particular node type (usually String), or it accepts a broader set of node types besides just Value. This change removes the Value struct and node type and replaces them by separate Integer, Float, String, and BitString node types that are proper node types and structs of their own and behave mostly like normal node types. Also, this removes the T_Null node tag, which was previously also a possible variant of Value but wasn't actually used outside of the Value contained in A_Const. Replace that by an isnull field in A_Const. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
2021-09-08Fix typoPeter Eisentraut
2021-09-08Disable anonymous record hash support except in special casesPeter Eisentraut
Commit 01e658fa74 added hash support for row types. This also added support for hashing anonymous record types, using the same approach that the type cache uses for comparison support for record types: It just reports that it works, but it might fail at run time if a component type doesn't actually support the operation. We get away with that for comparison because most types support that. But some types don't support hashing, so the current state can result in failures at run time where the planner chooses hashing over sorting, whereas that previously worked if only sorting was an option. We do, however, want the record hashing support for path tracking in recursive unions, and the SEARCH and CYCLE clauses built on that. In that case, hashing is the only plan option. So enable that, this commit implements the following approach: The type cache does not report that hashing is available for the record type. This undoes that part of 01e658fa74. Instead, callers that require hashing no matter what can override that result themselves. This patch only touches the callers to make the aforementioned recursive query cases work, namely the parse analysis of unions, as well as the hash_array() function. Reported-by: Sait Talha Nisanci <sait.nisanci@microsoft.com> Bug: #17158 Discussion: https://www.postgresql.org/message-id/flat/17158-8a2ba823982537a4%40postgresql.org
2021-09-08Clean up some code using "(expr) ? true : false"Michael Paquier
All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
2021-09-06Make timetz_zone() stable, and correct a bug for DYNTZ abbreviations.Tom Lane
Historically, timetz_zone() has used time(NULL) as the reference point for deciding whether DST is active. That means its result can change intra-statement, requiring it to be marked VOLATILE (cf. 35979e6c3). But that definition is pretty inconsistent with the way we deal with timestamps elsewhere. Let's make it use the transaction start time ("now()") as the reference point instead. That lets it be marked STABLE, and also saves a kernel call per invocation. While at it, remove the function's use of pg_time_t and pg_localtime. Those are inconsistent with the other code in this area, which indeed created a bug: timetz_zone() delivered completely wrong answers if the zone was specified by a dynamic TZ abbreviation. (We need to do something about that in the back branches, but the fix will look different from this.) Aleksander Alekseev and Tom Lane Discussion: https://postgr.es/m/CAJ7c6TOMG8zSNEZtCn5SPe+cCk3Lfxb71ZaQwT2F4T7PJ_t=KA@mail.gmail.com
2021-09-06Fix typo in comments.Fujii Masao
Author: Hou Zhijie Discussion: https://postgr.es/m/OS0PR01MB5716E6A6535FDFDC5A1B004194CE9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2021-09-01Don't print extra parens around expressions in extended statsTomas Vondra
The code printing expressions for extended statistics doubled the parens, producing results like ((a+1)), which is unnecessary and not consistent with how we print expressions elsewhere. Fixed by tweaking the code to produce just a single set of parens. Reported by Mark Dilger, fix by me. Backpatch to 14, where support for extended statistics on expressions was added. Reported-by: Mark Dilger Discussion: https://postgr.es/m/20210122040101.GF27167%40telsasoft.com