summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2018-03-21Fix mishandling of quoted-list GUC values in pg_dump and ruleutils.c.Tom Lane
Code that prints out the contents of setconfig or proconfig arrays in SQL format needs to handle GUC_LIST_QUOTE variables differently from other ones, because for those variables, flatten_set_variable_args() already applied a layer of quoting. The value can therefore safely be printed as-is, and indeed must be, or flatten_set_variable_args() will muck it up completely on reload. For all other GUC variables, it's necessary and sufficient to quote the value as a SQL literal. We'd recognized the need for this long ago, but mis-analyzed the need slightly, thinking that all GUC_LIST_INPUT variables needed the special treatment. That's actually wrong, since a valid value of a LIST variable might include characters that need quoting, although no existing variables accept such values. More to the point, we hadn't made any particular effort to keep the various places that deal with this up-to-date with the set of variables that actually need special treatment, meaning that we'd do the wrong thing with, for example, temp_tablespaces values. This affects dumping of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET commands. In ruleutils.c we can fix it reasonably honestly by exporting a guc.c function that allows discovering the flags for a given GUC variable. But pg_dump doesn't have easy access to that, so continue the old method of having a hard-wired list of affected variable names. At least we can fix it to have just one list not two, and update the list to match current reality. A remaining problem with this is that it only works for built-in GUC variables. pg_dump's list obvious knows nothing of third-party extensions, and even the "ask guc.c" method isn't bulletproof since the relevant extension might not be loaded. There's no obvious solution to that, so for now, we'll just have to discourage extension authors from inventing custom GUCs that need GUC_LIST_QUOTE. This has been busted for a long time, so back-patch to all supported branches. Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and Pavel Stehule Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz
2018-03-19Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.Tom Lane
refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and #5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
2018-03-03Fix assorted issues in convert_to_scalar().Tom Lane
If convert_to_scalar is passed a pair of datatypes it can't cope with, its former behavior was just to elog(ERROR). While this is OK so far as the core code is concerned, there's extension code that would like to use scalarltsel/scalargtsel/etc as selectivity estimators for operators that work on non-core datatypes, and this behavior is a show-stopper for that use-case. If we simply allow convert_to_scalar to return FALSE instead of outright failing, then the main logic of scalarltsel/scalargtsel will work fine for any operator that behaves like a scalar inequality comparison. The lack of conversion capability will mean that we can't estimate to better than histogram-bin-width precision, since the code will effectively assume that the comparison constant falls at the middle of its bin. But that's still a lot better than nothing. (Someday we should provide a way for extension code to supply a custom version of convert_to_scalar, but today is not that day.) While poking at this issue, we noted that the existing code for handling type bytea in convert_to_scalar is several bricks shy of a load. It assumes without checking that if the comparison value is type bytea, the bounds values are too; in the worst case this could lead to a crash. It also fails to detoast the input values, so that the comparison result is complete garbage if any input is toasted out-of-line, compressed, or even just short-header. I'm not sure how often such cases actually occur --- the bounds values, at least, are probably safe since they are elements of an array and hence can't be toasted. But that doesn't make this code OK. Back-patch to all supported branches, partly because author requested that, but mostly because of the bytea bugs. The change in API for the exposed routine convert_network_to_scalar() is theoretically a back-patch hazard, but it seems pretty unlikely that any third-party code is calling that function directly. Tomas Vondra, with some adjustments by me Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com
2018-02-28Rename base64 routines to avoid conflict with Solaris built-in functions.Tom Lane
Solaris 11.4 has built-in functions named b64_encode and b64_decode. Rename ours to something else to avoid the conflict (fortunately, ours are static so the impact is limited). One could wish for less duplication of code in this area, but that would be a larger patch and not very suitable for back-patching. Since this is a portability fix, we want to put it into all supported branches. Report and initial patch by Rainer Orth, reviewed and adjusted a bit by Michael Paquier Discussion: https://postgr.es/m/ydd372wk28h.fsf@CeBiTec.Uni-Bielefeld.DE
2018-02-26Avoid using unsafe search_path settings during dump and restore.Tom Lane
Historically, pg_dump has "set search_path = foo, pg_catalog" when dumping an object in schema "foo", and has also caused that setting to be used while restoring the object. This is problematic because functions and operators in schema "foo" could capture references meant to refer to pg_catalog entries, both in the queries issued by pg_dump and those issued during the subsequent restore run. That could result in dump/restore misbehavior, or in privilege escalation if a nefarious user installs trojan-horse functions or operators. This patch changes pg_dump so that it does not change the search_path dynamically. The emitted restore script sets the search_path to what was used at dump time, and then leaves it alone thereafter. Created objects are placed in the correct schema, regardless of the active search_path, by dint of schema-qualifying their names in the CREATE commands, as well as in subsequent ALTER and ALTER-like commands. Since this change requires a change in the behavior of pg_restore when processing an archive file made according to this new convention, bump the archive file version number; old versions of pg_restore will therefore refuse to process files made with new versions of pg_dump. Security: CVE-2018-1058
2017-11-26Make has_sequence_privilege support WITH GRANT OPTIONJoe Conway
The various has_*_privilege() functions all support an optional WITH GRANT OPTION added to the supported privilege types to test whether the privilege is held with grant option. That is, all except has_sequence_privilege() variations. Fix that. Back-patch to all supported branches. Discussion: https://postgr.es/m/005147f6-8280-42e9-5a03-dd2c1e4397ef@joeconway.com
2017-11-11Ignore XML declaration in xpath_internal(), for UTF8 databases.Noah Misch
When a value contained an XML declaration naming some other encoding, this function interpreted UTF8 bytes as the named encoding, yielding mojibake. xml_parse() already has similar logic. This would be necessary but not sufficient for non-UTF8 databases, so preserve behavior there until the xpath facility can support such databases comprehensively. Back-patch to 9.3 (all supported versions). Pavel Stehule and Noah Misch Discussion: https://postgr.es/m/CAFj8pRC-dM=tT=QkGi+Achkm+gwPmjyOayGuUfXVumCxkDgYWg@mail.gmail.com
2017-11-06Make json{b}_populate_recordset() use the right tuple descriptor.Tom Lane
json{b}_populate_recordset() used the tuple descriptor created from the query-level AS clause without worrying about whether it matched the actual input record type. If it didn't, that would usually result in a crash, though disclosure of server memory contents seems possible as well, for a skilled attacker capable of issuing crafted SQL commands. Instead, use the query-supplied descriptor only when there is no input tuple to look at, and otherwise get a tuple descriptor based on the input tuple's own type marking. The core code will detect any type mismatch in the latter case. Michael Paquier and Tom Lane, per a report from David Rowley. Back-patch to 9.3 where this functionality was introduced. Security: CVE-2017-15098
2017-10-25Process variadic arguments consistently in json functionsAndrew Dunstan
json_build_object and json_build_array and the jsonb equivalents did not correctly process explicit VARIADIC arguments. They are modified to use the new extract_variadic_args() utility function which abstracts away the details of the call method. Michael Paquier, reviewed by Tom Lane and Dmitry Dolgov. Backpatch to 9.5 for the jsonb fixes and 9.4 for the json fixes, as that's where they originated.
2017-10-11Doc: fix missing explanation of default object privileges.Tom Lane
The GRANT reference page, which lists the default privileges for new objects, failed to mention that USAGE is granted by default for data types and domains. As a lesser sin, it also did not specify anything about the initial privileges for sequences, FDWs, foreign servers, or large objects. Fix that, and add a comment to acldefault() in the probably vain hope of getting people to maintain this list in future. Noted by Laurenz Albe, though I editorialized on the wording a bit. Back-patch to all supported branches, since they all have this behavior. Discussion: https://postgr.es/m/1507620895.4152.1.camel@cybertec.at
2017-09-27Fix behavior when converting a float infinity to numeric.Tom Lane
float8_numeric() and float4_numeric() failed to consider the possibility that the input is an IEEE infinity. The results depended on the platform-specific behavior of sprintf(): on most platforms you'd get something like ERROR: invalid input syntax for type numeric: "inf" but at least on Windows it's possible for the conversion to succeed and deliver a finite value (typically 1), due to a nonstandard output format from sprintf and lack of syntax error checking in these functions. Since our numeric type lacks the concept of infinity, a suitable conversion is impossible; the best thing to do is throw an explicit error before letting sprintf do its thing. While at it, let's use snprintf not sprintf. Overrunning the buffer should be impossible if sprintf does what it's supposed to, but this is cheap insurance against a stack smash if it doesn't. Problem reported by Taiki Kondo. Patch by me based on fix suggestion from KaiGai Kohei. Back-patch to all supported branches. Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp
2017-07-24Ensure that pg_get_ruledef()'s output matches pg_get_viewdef()'s.Tom Lane
Various cases involving renaming of view columns are handled by having make_viewdef pass down the view's current relation tupledesc to get_query_def, which then takes care to use the column names from the tupledesc for the output column names of the SELECT. For some reason though, we'd missed teaching make_ruledef to do similarly when it is printing an ON SELECT rule, even though this is exactly the same case. The results from pg_get_ruledef would then be different and arguably wrong. In particular, this breaks pre-v10 versions of pg_dump, which in some situations would define views by means of emitting a CREATE RULE ... ON SELECT command. Third-party tools might not be happy either. In passing, clean up some crufty code in make_viewdef; we'd apparently modernized the equivalent code in make_ruledef somewhere along the way, and missed this copy. Per report from Gilles Darold. Back-patch to all supported versions. Discussion: https://postgr.es/m/ec05659a-40ff-4510-fc45-ca9d965d0838@dalibo.com
2017-07-20Fix dumping of outer joins with empty qual lists.Tom Lane
Normally, a JoinExpr would have empty "quals" only if it came from CROSS JOIN syntax. However, it's possible to get to this state by specifying NATURAL JOIN between two tables with no common column names, and there might be other ways too. The code previously printed no ON clause if "quals" was empty; that's right for CROSS JOIN but syntactically invalid if it's some type of outer join. Fix by printing ON TRUE in that case. This got broken by commit 2ffa740be, which stopped using NATURAL JOIN syntax in ruleutils output due to its brittleness in the face of column renamings. Back-patch to 9.3 where that commit appeared. Per report from Tushar Ahuja. Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com
2017-07-13Fix dumping of FUNCTION RTEs that contain non-function-call expressions.Tom Lane
The grammar will only accept something syntactically similar to a function call in a function-in-FROM expression. However, there are various ways to input something that ruleutils.c won't deparse that way, potentially leading to a view or rule that fails dump/reload. Fix by inserting a dummy CAST around anything that isn't going to deparse as a function (which is one of the ways to get something like that in there in the first place). In HEAD, also make use of the infrastructure added by this to avoid emitting unnecessary parentheses in CREATE INDEX deparsing. I did not change that in back branches, thinking that people might find it to be unexpected/unnecessary behavioral change. In HEAD, also fix incorrect logic for when to add extra parens to partition key expressions. Somebody apparently thought they could get away with simpler logic than pg_get_indexdef_worker has, but they were wrong --- a counterexample is PARTITION BY LIST ((a[1])). Ignoring the prettyprint flag for partition expressions isn't exactly a nice solution anyway. This has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/10477.1499970459@sss.pgh.pa.us
2017-07-12Fix ruleutils.c for domain-over-array cases, too.Tom Lane
Further investigation shows that ruleutils isn't quite up to speed either for cases where we have a domain-over-array: it needs to be prepared to look past a CoerceToDomain at the top level of field and element assignments, else it decompiles them incorrectly. Potentially this would result in failure to dump/reload a rule, if it looked like the one in the new test case. (I also added a test for EXPLAIN; that output isn't broken, but clearly we need more test coverage here.) Like commit b1cb32fb6, this bug is reachable in cases we already support, so back-patch all the way.
2017-05-24Tighten checks for whitespace in functions that parse identifiers etc.Tom Lane
This patch replaces isspace() calls with scanner_isspace() in functions that are likely to be presented with non-ASCII input. isspace() has the small advantage that it will correctly recognize no-break space in single-byte encodings (such as LATIN1); but it cannot work successfully for any multibyte character, and depending on platform it might return false positive results for some fragments of multibyte characters. That's disastrous for functions that are trying to discard whitespace between valid strings, as noted in bug #14662 from Justin Muise. Even treating no-break space as whitespace is pretty questionable for the usages touched here, because the core scanner would think it is an identifier character. Affected functions are parse_ident(), parseNameAndArgTypes (underlying regprocedurein() and siblings), SplitIdentifierString (used for parsing GUCs and options that are qualified names or lists of names), and SplitDirectoriesString (used for parsing GUCs that are lists of directories). All the functions adjusted here are parsing SQL identifiers and similar constructs, so it's reasonable to insist that their definition of whitespace match the core scanner. So we can hope that this won't cause many backwards-compatibility problems. I've left alone isspace() calls in places that aren't really expecting any non-ASCII input characters, such as float8in(). Back-patch to all supported branches. Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
2017-05-21Fix precision and rounding issues in money multiplication and division.Tom Lane
The cash_div_intX functions applied rint() to the result of the division. That's not merely useless (because the result is already an integer) but it causes precision loss for values larger than 2^52 or so, because of the forced conversion to float8. On the other hand, the cash_mul_fltX functions neglected to apply rint() to their multiplication results, thus possibly causing off-by-one outputs. Per C standard, arithmetic between any integral value and a float value is performed in float format. Thus, cash_mul_flt4 and cash_div_flt4 produced answers good to only about six digits, even when the float value is exact. We can improve matters noticeably by widening the float inputs to double. (It's tempting to consider using "long double" arithmetic if available, but that's probably too much of a stretch for a back-patched fix.) Also, document that cash_div_intX operators truncate rather than round. Per bug #14663 from Richard Pistole. Back-patch to all supported branches. Discussion: https://postgr.es/m/22403.1495223615@sss.pgh.pa.us
2017-05-18Fix typo in comment.Heikki Linnakangas
Daniel Gustafsson
2017-05-15Fix new warnings from GCC 7Peter Eisentraut
This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
2017-05-08Further patch rangetypes_selfuncs.c's statistics slot management.Tom Lane
Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8, not of the type of the column the statistics are for. This bug is at least partly the fault of sloppy specification comments for get_attstatsslot()/free_attstatsslot(): the type OID they want is that of the stavalues entries, not of the underlying column. (I double-checked other callers and they seem to get this right.) Adjust the comments to be more correct. Per buildfarm. Security: CVE-2017-7484
2017-05-08Fix possibly-uninitialized variable.Tom Lane
Oversight in e2d4ef8de et al (my fault not Peter's). Per buildfarm. Security: CVE-2017-7484
2017-05-08Add security checks to selectivity estimation functionsPeter Eisentraut
Some selectivity estimation functions run user-supplied operators over data obtained from pg_statistic without security checks, which allows those operators to leak pg_statistic data without having privileges on the underlying tables. Fix by checking that one of the following is satisfied: (1) the user has table or column privileges on the table underlying the pg_statistic data, or (2) the function implementing the user-supplied operator is leak-proof. If neither is satisfied, planning will proceed as if there are no statistics available. At least one of these is satisfied in most cases in practice. The only situations that are negatively impacted are user-defined or not-leak-proof operators on a security-barrier view. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Peter Eisentraut <peter_e@gmx.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Security: CVE-2017-7484
2017-05-04Fix cursor_to_xml in tableforest false modePeter Eisentraut
It only produced <row> elements but no wrapping <table> element. By contrast, cursor_to_xmlschema produced a schema that is now correct but did not previously match the XML data produced by cursor_to_xml. In passing, also fix a minor misunderstanding about moving cursors in the tests related to this. Reported-by: filip@jirsak.org Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
2017-04-05Fix integer-overflow problems in interval comparison.Tom Lane
When using integer timestamps, the interval-comparison functions tried to compute the overall magnitude of an interval as an int64 number of microseconds. As reported by Frazer McLean, this overflows for intervals exceeding about 296000 years, which is bad since we nominally allow intervals many times larger than that. That results in wrong comparison results, and possibly in corrupted btree indexes for columns containing such large interval values. To fix, compute the magnitude as int128 instead. Although some compilers have native support for int128 calculations, many don't, so create our own support functions that can do 128-bit addition and multiplication if the compiler support isn't there. These support functions are designed with an eye to allowing the int128 code paths in numeric.c to be rewritten for use on all platforms, although this patch doesn't do that, or even provide all the int128 primitives that will be needed for it. Back-patch as far as 9.4. Earlier releases did not guard against overflow of interval values at all (commit 146604ec4 fixed that), so it seems not very exciting to worry about overly-large intervals for them. Before 9.6, we did not assume that unreferenced "static inline" functions would not draw compiler warnings, so omit functions not directly referenced by timestamp.c, the only present consumer of int128.h. (We could have omitted these functions in HEAD too, but since they were written and debugged on the way to the present patch, and they look likely to be needed by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent such warnings in a --disable-integer-datetimes build, though. Before 9.5, configure will never define HAVE_INT128, so the part of int128.h that exploits a native int128 implementation is dead code in the 9.4 branch. I didn't bother to remove it, thinking that keeping the file looking similar in different branches is more useful. In HEAD only, add a simple test harness for int128.h in src/tools/. In back branches, this does not change the float-timestamps code path. That's not subject to the same kind of overflow risk, since it computes the interval magnitude as float8. (No doubt, when this code was originally written, overflow was disregarded for exactly that reason.) There is a precision hazard instead :-(, but we'll avert our eyes from that question, since no complaints have been reported and that code's deprecated anyway. Kyotaro Horiguchi and Tom Lane Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com
2017-02-09Blind try to fix portability issue in commit 8f93bd851 et al.Tom Lane
The S/390 members of the buildfarm are showing failures indicating that they're having trouble with the rint() calls I added yesterday. There's no good reason for that, and I wonder if it is a compiler bug similar to the one we worked around in d9476b838. Try to fix it using the same method as before, namely to store the result of rint() back into a "double" variable rather than immediately converting to int64. (This isn't entirely waving a dead chicken, since on machines with wider-than-double float registers, the extra store forces a width conversion. I don't know if S/390 is like that, but it seems worth trying.) In passing, merge duplicate ereport() calls in float8_timestamptz(). Per buildfarm.
2017-02-08Fix roundoff problems in float8_timestamptz() and make_interval().Tom Lane
When converting a float value to integer microseconds, we should be careful to round the value to the nearest integer, typically with rint(); simply assigning to an int64 variable will truncate, causing apparently off-by-one values in cases that should work. Most places in the datetime code got this right, but not these two. float8_timestamptz() is new as of commit e511d878f (9.6). Previous versions effectively depended on interval_mul() to do roundoff correctly, which it does, so this fixes an accuracy regression in 9.6. The problem in make_interval() dates to its introduction in 9.4. Aside from being careful to round not truncate, let's incorporate the hours and minutes inputs into the result with exact integer arithmetic, rather than risk introducing roundoff error where there need not have been any. float8_timestamptz() problem reported by Erik Nordström, though this is not his proposed patch. make_interval() problem found by me. Discussion: https://postgr.es/m/CAHuQZDS76jTYk3LydPbKpNfw9KbACmD=49dC4BrzHcfPv6yA1A@mail.gmail.com
2017-02-06Fix typos in comments.Heikki Linnakangas
Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-01-26Ensure that a tsquery like '!foo' matches empty tsvectors.Tom Lane
!foo means "the tsvector does not contain foo", and therefore it should match an empty tsvector. ts_match_vq() overenthusiastically supposed that an empty tsvector could never match any query, so it forcibly returned FALSE, the wrong answer. Remove the premature optimization. Our behavior on this point was inconsistent, because while seqscans and GIST index searches both failed to match empty tsvectors, GIN index searches would find them, since GIN scans don't rely on ts_match_vq(). That makes this certainly a bug, not a debatable definition disagreement, so back-patch to all supported branches. Report and diagnosis by Tom Dunstan (bug #14515); added test cases by me. Discussion: https://postgr.es/m/20170126025524.1434.97828@wrigleys.postgresql.org
2017-01-05Fix handling of empty arrays in array_fill().Tom Lane
array_fill(..., array[0]) produced an empty array, which is probably what users expect, but it was a one-dimensional zero-length array which is not our standard representation of empty arrays. Also, for no very good reason, it rejected empty input arrays; that case should be allowed and produce an empty output array. In passing, remove the restriction that the input array(s) have lower bound 1. That seems rather pointless, and it would have needed extra complexity to make the check deal with empty input arrays. Per bug #14487 from Andrew Gierth. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/20170105152156.10135.64195@wrigleys.postgresql.org
2016-12-27Fix interval_transform so it doesn't throw away non-no-op casts.Tom Lane
interval_transform() contained two separate bugs that caused it to sometimes mistakenly decide that a cast from interval to restricted interval is a no-op and throw it away. First, it was wrong to rely on dt.h's field type macros to have an ordering consistent with the field's significance; in one case they do not. This led to mistakenly treating YEAR as less significant than MONTH, so that a cast from INTERVAL MONTH to INTERVAL YEAR was incorrectly discarded. Second, fls(1<<k) produces k+1 not k, so comparing its output directly to SECOND was wrong. This led to supposing that a cast to INTERVAL MINUTE was really a cast to INTERVAL SECOND and so could be discarded. To fix, get rid of the use of fls(), and make a function based on intervaltypmodout to produce a field ID code adapted to the need here. Per bug #14479 from Piotr Stefaniak. Back-patch to 9.2 where transform functions were introduced, because this code was born broken. Discussion: https://postgr.es/m/20161227172307.10135.7747@wrigleys.postgresql.org
2016-12-16Fix off-by-one in memory allocation for quote_literal_cstr().Heikki Linnakangas
The calculation didn't take into account the NULL terminator. That lead to overwriting the palloc'd buffer by one byte, if the input consists entirely of backslashes. For example "format('%L', E'\\')". Fixes bug #14468. Backpatch to all supported versions. Report: https://www.postgresql.org/message-id/20161216105001.13334.42819%40wrigleys.postgresql.org
2016-12-11Prevent crash when ts_rewrite() replaces a non-top-level subtree with null.Tom Lane
When ts_rewrite()'s replacement argument is an empty tsquery, it's supposed to simplify any operator nodes whose operand(s) become NULL; but it failed to do that reliably, because dropvoidsubtree() only examined the top level of the result tree. Rather than make a second recursive pass, let's just give the responsibility to dofindsubquery() to simplify while it's doing the main replacement pass. Per report from Andreas Seltenreich. Artur Zakirov, with some cosmetic changes by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/8737i01dew.fsf@credativ.de
2016-11-23Make sure ALTER TABLE preserves index tablespaces.Tom Lane
When rebuilding an existing index, ALTER TABLE correctly kept the physical file in the same tablespace, but it messed up the pg_class entry if the index had been in the database's default tablespace and "default_tablespace" was set to some non-default tablespace. This led to an inaccessible index. Fix by fixing pg_get_indexdef_string() to always include a tablespace clause, whether or not the index is in the default tablespace. The previous behavior was installed in commit 537e92e41, and I think it just wasn't thought through very clearly; certainly the possible effect of default_tablespace wasn't considered. There's some risk in changing the behavior of this function, but there are no other call sites in the core code. Even if it's being used by some third party extension, it's fairly hard to envision a usage that is okay with a tablespace clause being appended some of the time but can't handle it being appended all the time. Back-patch to all supported versions. Code fix by me, investigation and test cases by Michael Paquier. Discussion: <1479294998857-5930602.post@n3.nabble.com>
2016-11-21Fix PGLC_localeconv() to handle errors better.Tom Lane
The code was intentionally not very careful about leaking strdup'd strings in case of an error. That was forgivable probably, but it also failed to notice strdup() failures, which could lead to subsequent null-pointer-dereference crashes, since many callers unsurprisingly didn't check for null pointers in the struct lconv fields. An even worse problem is that it could throw error while we were setlocale'd to a non-C locale, causing unwanted behavior in subsequent libc calls. Rewrite to ensure that we cannot throw elog(ERROR) until after we've restored the previous locale settings, or at least attempted to. (I'm sorely tempted to make restore failure be a FATAL error, but will refrain for the moment.) Having done that, it's not much more work to ensure that we clean up strdup'd storage on the way out, too. This code is substantially the same in all supported branches, so back-patch all the way. Michael Paquier and Tom Lane Discussion: <CAB7nPqRMbGqa_mesopcn4MPyTs34eqtVEK7ELYxvvV=oqS00YA@mail.gmail.com>
2016-10-30Fix nasty performance problem in tsquery_rewrite().Tom Lane
tsquery_rewrite() tries to find matches to subsets of AND/OR conditions; for example, in the query 'a | b | c' the substitution subquery 'a | c' should match and lead to replacement of the first and third items. That's fine, but the matching algorithm apparently takes about O(2^N) for an N-clause query (I say "apparently" because the code is also both unintelligible and uncommented). We could probably do better than that even without any extra assumptions --- but actually, we know that the subclauses are sorted, indeed are depending on that elsewhere in this very same function. So we can just scan the two lists a single time to detect matches, as though we were doing a merge join. Also do a re-flattening call (QTNTernary()) in tsquery_rewrite_query, just to make sure that the tree fits the expectations of the next search cycle. I didn't try to devise a test case for this, but I'm pretty sure that the oversight could have led to failure to match in some cases where a match would be expected. Improve comments, and also stick a CHECK_FOR_INTERRUPTS into dofindsubquery, just in case it's still too slow for somebody. Per report from Andreas Seltenreich. Back-patch to all supported branches. Discussion: <8760oasf2y.fsf@credativ.de>
2016-10-30Fix bogus tree-flattening logic in QTNTernary().Tom Lane
QTNTernary() contains logic to flatten, eg, '(a & b) & c' into 'a & b & c', which is all well and good, but it tries to do that to NOT nodes as well, so that '!!a' gets changed to '!a'. Explicitly restrict the conversion to be done only on AND and OR nodes, and add a test case illustrating the bug. In passing, provide some comments for the sadly naked functions in tsquery_util.c, and simplify some baroque logic in QTNFree(), which I think may have been leaking some items it intended to free. Noted while investigating a complaint from Andreas Seltenreich. Back-patch to all supported versions.
2016-10-23Avoid testing tuple visibility without buffer lock in RI_FKey_check().Tom Lane
Despite the argumentation I wrote in commit 7a2fe85b0, it's unsafe to do this, because in corner cases it's possible for HeapTupleSatisfiesSelf to try to set hint bits on the target tuple; and at least since 8.2 we have required the buffer content lock to be held while setting hint bits. The added regression test exercises one such corner case. Unpatched, it causes an assertion failure in assert-enabled builds, or otherwise would cause a hint bit change in a buffer we don't hold lock on, which given the right race condition could result in checksum failures or other data consistency problems. The odds of a problem in the field are probably pretty small, but nonetheless back-patch to all supported branches. Report: <19391.1477244876@sss.pgh.pa.us>
2016-10-19Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g.Tom Lane
IANA got rid of the really silly "abbreviation" and replaced it with one that's only moderately silly. But it's still pointless, so keep on not showing it.
2016-10-18Fix cidin() to handle values above 2^31 platform-independently.Tom Lane
CommandId is declared as uint32, and values up to 4G are indeed legal. cidout() handles them properly by treating the value as unsigned int. But cidin() was just using atoi(), which has platform-dependent behavior for values outside the range of signed int, as reported by Bart Lengkeek in bug #14379. Use strtoul() instead, as xidin() does. In passing, make some purely cosmetic changes to make xidin/xidout look more like cidin/cidout; the former didn't have a monopoly on best practice IMO. Neither xidin nor cidin make any attempt to throw error for invalid input. I didn't change that here, and am not sure it's worth worrying about since neither is really a user-facing type. The point is just to ensure that indubitably-valid inputs work as expected. It's been like this for a long time, so back-patch to all supported branches. Report: <20161018152550.1413.6439@wrigleys.postgresql.org>
2016-10-14Fix assorted integer-overflow hazards in varbit.c.Tom Lane
bitshiftright() and bitshiftleft() would recursively call each other infinitely if the user passed INT_MIN for the shift amount, due to integer overflow in negating the shift amount. To fix, clamp to -VARBITMAXLEN. That doesn't change the results since any shift distance larger than the input bit string's length produces an all-zeroes result. Also fix some places that seemed inadequately paranoid about input typmods exceeding VARBITMAXLEN. While a typmod accepted by anybit_typmodin() will certainly be much less than that, at least some of these spots are reachable with user-chosen integer values. Andreas Seltenreich and Tom Lane Discussion: <87d1j2zqtz.fsf@credativ.de>
2016-09-10Fix miserable coding in pg_stat_get_activity().Tom Lane
Commit dd1a3bccc replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit f91feba87 then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.
2016-09-02Don't require dynamic timezone abbreviations to match underlying time zone.Tom Lane
Previously, we threw an error if a dynamic timezone abbreviation did not match any abbreviation recorded in the referenced IANA time zone entry. That seemed like a good consistency check at the time, but it turns out that a number of the abbreviations in the IANA database are things that Olson and crew made up out of whole cloth. Their current policy is to remove such names in favor of using simple numeric offsets. Perhaps unsurprisingly, a lot of these made-up abbreviations have varied in meaning over time, which meant that our commit b2cbced9e and later changes made them into dynamic abbreviations. So with newer IANA database versions that don't mention these abbreviations at all, we fail, as reported in bug #14307 from Neil Anderson. It's worse than just a few unused-in-the-wild abbreviations not working, because the pg_timezone_abbrevs view stops working altogether (since its underlying function tries to compute the whole view result in one call). We considered deleting these abbreviations from our abbreviations list, but the problem with that is that we can't stay ahead of possible future IANA changes. Instead, let's leave the abbreviations list alone, and treat any "orphaned" dynamic abbreviation as just meaning the referenced time zone. It will behave a bit differently than it used to, in that you can't any longer override the zone's standard vs. daylight rule by using the "wrong" abbreviation of a pair, but that's better than failing entirely. (Also, this solution can be interpreted as adding a small new feature, which is that any abbreviation a user wants can be defined as referencing a time zone name.) Back-patch to all supported branches, since this problem affects all of them when using tzdata 2016f or newer. Report: <20160902031551.15674.67337@wrigleys.postgresql.org> Discussion: <6189.1472820913@sss.pgh.pa.us>
2016-08-14Remove bogus dependencies on NUMERIC_MAX_PRECISION.Tom Lane
NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision and scale you can write in a numeric typmod. It might once have had something to do with the allowed range of a typmod-less numeric value, but at least since 9.1 we've allowed, and documented that we allowed, any value that would physically fit in the numeric storage format; which is something over 100000 decimal digits, not 1000. Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit on the allowed range of the exponent in scientific-format input. That was especially silly in view of the fact that you can enter larger numbers as long as you don't use 'e' to do it. Just constrain the value enough to avoid localized overflow, and let make_result be the final arbiter of what is too large. Likewise adjust ecpg's equivalent of this code. Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the number of base-NBASE digits it would accept. That created a dump/restore hazard for binary COPY without doing anything useful; the wire-format limit on number of digits (65535) is about as tight as we would want. In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what the numeric range limit is. That code doesn't exist in the back branches. Per gripe from Aravind Kumar. Back-patch to all supported branches, since they all contain the documentation claim about allowed range of NUMERIC (cf commit cabf5d84b). Discussion: <2895.1471195721@sss.pgh.pa.us>
2016-08-08Fix several one-byte buffer over-reads in to_numberPeter Eisentraut
Several places in NUM_numpart_from_char(), which is called from the SQL function to_number(text, text), could accidentally read one byte past the end of the input buffer (which comes from the input text datum and is not null-terminated). 1. One leading space character would be skipped, but there was no check that the input was at least one byte long. This does not happen in practice, but for defensiveness, add a check anyway. 2. Commit 4a3a1e2cf apparently accidentally doubled that code that skips one space character (so that two spaces might be skipped), but there was no overflow check before skipping the second byte. Fix by removing that duplicate code. 3. A logic error would allow a one-byte over-read when looking for a trailing sign (S) placeholder. In each case, the extra byte cannot be read out directly, but looking at it might cause a crash. The third item was discovered by Piotr Stefaniak, the first two were found and analyzed by Tom Lane and Peter Eisentraut.
2016-08-07Fix misestimation of n_distinct for a nearly-unique column with many nulls.Tom Lane
If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-07-28Fix assorted fallout from IS [NOT] NULL patch.Tom Lane
Commits 4452000f3 et al established semantics for NullTest.argisrow that are a bit different from its initial conception: rather than being merely a cache of whether we've determined the input to have composite type, the flag now has the further meaning that we should apply field-by-field testing as per the standard's definition of IS [NOT] NULL. If argisrow is false and yet the input has composite type, the construct instead has the semantics of IS [NOT] DISTINCT FROM NULL. Update the comments in primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print such cases correctly. In the case of ruleutils.c, this merely results in cosmetic changes in EXPLAIN output, since the case can't currently arise in stored rules. However, it represents a live bug for deparse.c, which would formerly have sent a remote query that had semantics different from the local behavior. (From the user's standpoint, this means that testing a remote nested-composite column for null-ness could have had unexpected recursive behavior much like that fixed in 4452000f3.) In a related but somewhat independent fix, make plancat.c set argisrow to false in all NullTest expressions constructed to represent "attnotnull" constructs. Since attnotnull is actually enforced as a simple null-value check, this is a more accurate representation of the semantics; we were previously overpromising what it meant for composite columns, which might possibly lead to incorrect planner optimizations. (It seems that what the SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so arguably we are violating the spec and should fix attnotnull to do the other thing. If we ever do, this part should get reverted.) Back-patch, same as the previous commit. Discussion: <10682.1469566308@sss.pgh.pa.us>
2016-07-16Fix crash in close_ps() for NaN input coordinates.Tom Lane
The Assert() here seems unreasonably optimistic. Andreas Seltenreich found that it could fail with NaNs in the input geometries, and it seems likely to me that it might fail in corner cases due to roundoff error, even for ordinary input values. As a band-aid, make the function return SQL NULL instead of crashing. Report: <87d1md1xji.fsf@credativ.de>
2016-07-14Fix GiST index build for NaN values in geometric types.Tom Lane
GiST index build could go into an infinite loop when presented with boxes (or points, circles or polygons) containing NaN component values. This happened essentially because the code assumed that x == x is true for any "double" value x; but it's not true for NaNs. The looping behavior was not the only problem though: we also attempted to sort the items using simple double comparisons. Since NaNs violate the trichotomy law, qsort could (in principle at least) get arbitrarily confused and mess up the sorting of ordinary values as well as NaNs. And we based splitting choices on box size calculations that could produce NaNs, again resulting in undesirable behavior. To fix, replace all comparisons of doubles in this logic with float8_cmp_internal, which is NaN-aware and is careful to sort NaNs consistently, higher than any non-NaN. Also rearrange the box size calculation to not produce NaNs; instead it should produce an infinity for a box with NaN on one side and not-NaN on the other. I don't by any means claim that this solves all problems with NaNs in geometric values, but it should at least make GiST index insertion work reliably with such data. It's likely that the index search side of things still needs some work, and probably regular geometric operations too. But with this patch we're laying down a convention for how such cases ought to behave. Per bug #14238 from Guang-Dih Lei. Back-patch to 9.2; the code used before commit 7f3bd86843e5aad8 is quite different and doesn't lock up on my simple test case, nor on the submitter's dataset. Report: <20160708151747.1426.60150@wrigleys.postgresql.org> Discussion: <28685.1468246504@sss.pgh.pa.us>
2016-07-01Be more paranoid in ruleutils.c's get_variable().Tom Lane
We were merely Assert'ing that the Var matched the RTE it's supposedly from. But if the user passes incorrect information to pg_get_expr(), the RTE might in fact not match; this led either to Assert failures or core dumps, as reported by Chris Hanks in bug #14220. To fix, just convert the Asserts to test-and-elog. Adjust an existing test-and-elog elsewhere in the same function to be consistent in wording. (If we really felt these were user-facing errors, we might promote them to ereport's; but I can't convince myself that they're worth translating.) Back-patch to 9.3; the problematic code doesn't exist before that, and a quick check says that 9.2 doesn't crash on such cases. Michael Paquier and Thomas Munro Report: <20160629224349.1407.32667@wrigleys.postgresql.org>
2016-06-16Fix validation of overly-long IPv6 addresses.Tom Lane
The inet/cidr types sometimes failed to reject IPv6 inputs with too many colon-separated fields, instead translating them to '::/0'. This is the result of a thinko in the original ISC code that seems to be as yet unreported elsewhere. Per bug #14198 from Stefan Kaltenbrunner. Report: <20160616182222.5798.959@wrigleys.postgresql.org>