summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2020-08-15Move new LOCKTAG_DATABASE_FROZEN_IDS to end of enum LockTagType.Noah Misch
Several PGXN modules reference LockTagType values; renumbering would force a recompile of those modules. Oversight in back-patch of today's commit 566372b3d6435639e4cc4476d79b8505a0297c87. Back-patch to released branches, v12 through 9.5. Reported by Tom Lane. Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
2020-08-15Prevent concurrent SimpleLruTruncate() for any given SLRU.Noah Misch
The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
2020-07-31Fix recently-introduced performance problem in ts_headline().Tom Lane
The new hlCover() algorithm that I introduced in commit c9b0c678d turns out to potentially take O(N^2) or worse time on long documents, if there are many occurrences of individual query words but few or no substrings that actually satisfy the query. (One way to hit this behavior is with a "common_word & rare_word" type of query.) This seems unavoidable given the original goal of checking every substring of the document, so we have to back off that idea. Fortunately, it seems unlikely that anyone would really want headlines spanning all of a long document, so we can avoid the worse-than-linear behavior by imposing a maximum length of substring that we'll consider. For now, just hard-wire that maximum length as a multiple of max_words times max_fragments. Perhaps at some point somebody will argue for exposing it as a ts_headline parameter, but I'm hesitant to make such a feature addition in a back-patched bug fix. I also noted that the hlFirstIndex() function I'd added in that commit was unnecessarily stupid: it really only needs to check whether a HeadlineWordEntry's item pointer is null or not. This wouldn't make all that much difference in typical cases with queries having just a few terms, but a cycle shaved is a cycle earned. In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse. This ensures that hlCover's loop is cancellable if it manages to take a long time, and it may protect some other TS_execute callers as well. Back-patch to 9.6 as the previous commit was. I also chose to add the CHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seems to avoid the O(N^2) behavior, at least on the test case I tried, but nonetheless it's not very quick on a long document. Per report from Stephen Frost. Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
2020-07-21neqjoinsel must now pass through collation to eqjoinsel.Tom Lane
Since commit 044c99bc5, eqjoinsel passes the passed-in collation to any operators it invokes. However, neqjoinsel failed to pass on whatever collation it got, so that if we invoked a collation-dependent operator via that code path, we'd get "could not determine which collation to use for string comparison" or the like. Per report from Justin Pryzby. Back-patch to v12, like the previous commit. Discussion: https://postgr.es/m/20200721191606.GL5748@telsasoft.com
2020-07-17Fix whitespacePeter Eisentraut
2020-07-11Forbid numeric NaN in jsonpathAlexander Korotkov
SQL standard doesn't define numeric Inf or NaN values. It appears even more ridiculous to support then in jsonpath assuming JSON doesn't support these values as well. This commit forbids returning NaN from .double(), which was previously allowed. NaN can't be result of inner-jsonpath computation over non-NaNs. So, we can not expect NaN in the jsonpath output. Reported-by: Tom Lane Discussion: https://postgr.es/m/203949.1591879542%40sss.pgh.pa.us Author: Alexander Korotkov Reviewed-by: Tom Lane Backpatch-through: 12
2020-07-11Improve error reporting for jsonpath .double() methodAlexander Korotkov
When jsonpath .double() method detects that numeric or string can't be converted to double precision, it throws an error. This commit makes these errors explicitly express the reason of failure. Discussion: https://postgr.es/m/CAPpHfdtqJtiSXkP7tOXez18NxhLUH_-75bL8%3DOce4Ki%2Bbv7V6Q%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Tom Lane Backpatch-through: 12
2020-07-09Fix pg_current_logfile() to not emit a carriage return on Windows.Tom Lane
Due to not having our signals straight about CRLF vs. LF line termination, the output of pg_current_logfile() included a trailing \r on Windows. To fix, force the file descriptor it uses into text mode. While here, move a couple of local variable declarations to make the function's logic clearer. In v12 and v13, also back-patch the test added by 1c4e88e2f so that this function has some test coverage. However, the 004_logrotate.pl test script doesn't exist before v12, and it didn't seem worth adding to older branches just for this. Per report from Thomas Kellerer. Back-patch to v10 where this function was added. Discussion: https://postgr.es/m/412ae8da-76bb-640f-039a-f3513499e53d@gmx.net
2020-07-04Fix "ignoring return value" complaints from commit 96d1f423f9Joe Conway
The cfbot and some BF animals are complaining about the previous read_binary_file commit because of ignoring return value of ‘fread’. So let's make everyone happy by testing the return value even though not strictly needed. Reported by Justin Pryzby, and suggested patch by Tom Lane. Backpatched to v11 same as the previous commit. Reported-By: Justin Pryzby Reviewed-By: Tom Lane Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com Backpatch-through: 11
2020-07-04Read until EOF vice stat-reported size in read_binary_fileJoe Conway
read_binary_file(), used by SQL functions pg_read_file() and friends, uses stat to determine file length to read, when not passed an explicit length as an argument. This is problematic, for example, if the file being read is a virtual file with a stat-reported length of zero. Arrange to read until EOF, or StringInfo data string lenth limit, is reached instead. Original complaint and patch by me, with significant review, corrections, advice, and code optimizations by Tom Lane. Backpatched to v11. Prior to that only paths relative to the data and log dirs were allowed for files, so no "zero length" files were reachable anyway. Reviewed-By: Tom Lane Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com Backpatch-through: 11
2020-06-13Fix behavior of float aggregates for single Inf or NaN inputs.Tom Lane
When there is just one non-null input value, and it is infinity or NaN, aggregates such as stddev_pop and covar_pop should produce a NaN result, because the calculation is not well-defined. They used to do so, but since we adopted Youngs-Cramer aggregation in commit e954a727f, they produced zero instead. That's an oversight, so fix it. Add tests exercising these edge cases. Affected aggregates are var_pop(double precision) stddev_pop(double precision) var_pop(real) stddev_pop(real) regr_sxx(double precision,double precision) regr_syy(double precision,double precision) regr_sxy(double precision,double precision) regr_r2(double precision,double precision) regr_slope(double precision,double precision) regr_intercept(double precision,double precision) covar_pop(double precision,double precision) corr(double precision,double precision) Back-patch to v12 where the behavior change was accidentally introduced. Report and patch by me; thanks to Dean Rasheed for review. Discussion: https://postgr.es/m/353062.1591898766@sss.pgh.pa.us
2020-06-11Fix mishandling of NaN counts in numeric_[avg_]combine.Tom Lane
When merging two NumericAggStates, the code missed adding the new state's NaNcount unless its N was also nonzero; since those counts are independent, this is wrong. This would only have visible effect if some partial aggregate scans found only NaNs while earlier ones found only non-NaNs; then we could end up falsely deciding that there were no NaNs and fail to return a NaN final result as expected. That's pretty improbable, so it's no surprise this hasn't been reported from the field. Still, it's a bug. I didn't try to produce a regression test that would show the bug, but I did notice that these functions weren't being reached at all in our regression tests, so I improved the tests to at least exercise them. With these additions, I see pretty complete code coverage on the aggregation-related functions in numeric.c. Back-patch to 9.6 where this code was introduced. (I only added the improved test case as far back as v10, though, since the relevant part of aggregates.sql isn't there at all in 9.6.)
2020-06-05Use query collation, not column's collation, while examining statistics.Tom Lane
Commit 5e0928005 changed the planner so that, instead of blindly using DEFAULT_COLLATION_OID when invoking operators for selectivity estimation, it would use the collation of the column whose statistics we're considering. This was recognized as still being not quite the right thing, but it seemed like a good incremental improvement. However, shortly thereafter we introduced nondeterministic collations, and that creates cases where operators can fail if they're passed the wrong collation. We don't want planning to fail in cases where the query itself would work, so this means that we *must* use the query's collation when invoking operators for estimation purposes. The only real problem this creates is in ineq_histogram_selectivity, where the binary search might produce a garbage answer if we perform comparisons using a different collation than the column's histogram is ordered with. However, when the query's collation is significantly different from the column's default collation, the estimate we previously generated would be pretty irrelevant anyway; so it's not clear that this will result in noticeably worse estimates in practice. (A follow-on patch will improve this situation in HEAD, but it seems too invasive for back-patch.) The patch requires changing the signatures of mcv_selectivity and allied functions, which are exported and very possibly are used by extensions. In HEAD, I just did that, but an API/ABI break of this sort isn't acceptable in stable branches. Therefore, in v12 the patch introduces "mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes the old functions into wrappers that assume DEFAULT_COLLATION_OID should be used. That does not match the prior behavior, but it should avoid risk of failure in most cases. (In practice, I think most extension datatypes aren't collation-aware, so the change probably doesn't matter to them.) Per report from James Lucas. Back-patch to v12 where the problem was introduced. Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-04Reject "23:59:60.nnn" in datetime input.Tom Lane
It's intentional that we don't allow values greater than 24 hours, while we do allow "24:00:00" as well as "23:59:60" as inputs. However, the range check was miscoded in such a way that it would accept "23:59:60.nnn" with a nonzero fraction. For time or timetz, the stored result would then be greater than "24:00:00" which would fail dump/reload, not to mention possibly confusing other operations. Fix by explicitly calculating the result and making sure it does not exceed 24 hours. (This calculation is redundant with what will happen later in tm2time or tm2timetz. Maybe someday somebody will find that annoying enough to justify refactoring to avoid the duplication; but that seems too invasive for a back-patched bug fix, and the cost is probably unmeasurable anyway.) Note that this change also rejects such input as the time portion of a timestamp(tz) value. Back-patch to v10. The bug is far older, but to change this pre-v10 we'd need to ensure that the logic behaves sanely with float timestamps, which is possibly nontrivial due to roundoff considerations. Doesn't really seem worth troubling with. Per report from Christoph Berg. Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
2020-06-01Fix use-after-release mistake in currtid() and currtid2() for viewsMichael Paquier
This issue has been present since the introduction of this code as of a3519a2 from 2002, and has been found by buildfarm member prion that uses RELCACHE_FORCE_RELEASE via the tests introduced recently in e786be5. Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz Backpatch-through: 9.5
2020-06-01Fix crashes with currtid() and currtid2()Michael Paquier
A relation that has no storage initializes rd_tableam to NULL, which caused those two functions to crash because of a pointer dereference. Note that in 11 and older versions, this has always failed with a confusing error "could not open file". These two functions are used by the Postgres ODBC driver, which requires them only when connecting to a backend strictly older than 8.1. When connected to 8.2 or a newer version, the driver uses a RETURNING clause instead whose support has been added in 8.2, so it should be possible to just remove both functions in the future. This is left as an issue to address later. While on it, add more regression tests for those functions as we never really had coverage for them, and for aggregates of TIDs. Reported-by: Jaime Casanova, via sqlsmith Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/CAJGNTeO93u-5APMga6WH41eTZ3Uee9f3s8dCpA-GSSqNs1b=Ug@mail.gmail.com Backpatch-through: 12
2020-05-28Add CHECK_FOR_INTERRUPTS() to the repeat() functionJoe Conway
The repeat() function loops for potentially a long time without ever checking for interrupts. This prevents, for example, a query cancel from interrupting until the work is all done. Fix by inserting a CHECK_FOR_INTERRUPTS() into the loop. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
2020-05-26Add lcov exclusion markers to jsonpath scannerPeter Eisentraut
This was done for all scanners in 421167362242ce1fb46d6d720798787e7cd65aad but not added to the new one.
2020-05-14Fix the MSVC build for versions 2015 and later.Amit Kapila
Visual Studio 2015 and later versions should still be able to do the same as Visual Studio 2012, but the declaration of locale_name is missing in _locale_t, causing the code compilation to fail, hence this falls back instead on to enumerating all system locales by using EnumSystemLocalesEx to find the required locale name.  If the input argument is in Unix-style then we can get ISO Locale name directly by using GetLocaleInfoEx() with LCType as LOCALE_SNAME. In passing, change the documentation references of the now obsolete links. Note that this problem occurs only with NLS enabled builds. Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila Reviewed-by: Ranier Vilela and Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
2020-05-07Fix YA text phrase search bug.Tom Lane
checkcondition_str() failed to report multiple matches for a prefix pattern correctly: it would dutifully merge the match positions, but then after exiting that loop, if the last prefix-matching word had had no suitable positions, it would report there were no matches. The upshot would be failing to recognize a match that the query should match. It looks like you need all of these conditions to see the bug: * a phrase search (else we don't ask for match position details) * a prefix search item (else we don't get to this code) * a weight restriction (else checkclass_str won't fail) Noted while investigating a problem report from Pavel Borisov, though this is distinct from the issue he was on about. Back-patch to 9.6 where phrase search was added.
2020-05-01Get rid of trailing semicolons in C macro definitions.Tom Lane
Writing a trailing semicolon in a macro is almost never the right thing, because you almost always want to write a semicolon after each macro call instead. (Even if there was some reason to prefer not to, pgindent would probably make a hash of code formatted that way; so within PG the rule should basically be "don't do it".) Thus, if we have a semi inside the macro, the compiler sees "something;;". Much of the time the extra empty statement is harmless, but it could lead to mysterious syntax errors at call sites. In perhaps an overabundance of neatnik-ism, let's run around and get rid of the excess semicolons whereever possible. The only thing worse than a mysterious syntax error is a mysterious syntax error that only happens in the back branches; therefore, backpatch these changes where relevant, which is most of them because most of these mistakes are old. (The lack of reported problems shows that this is largely a hypothetical issue, but still, it could bite us in some future patch.) John Naylor and Tom Lane Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-04-27Fix full text search to handle NOT above a phrase search correctly.Tom Lane
Queries such as '!(foo<->bar)' failed to find matching rows when implemented as a GiST or GIN index search. That's because of failing to handle phrase searches as tri-valued when considering a query without any position information for the target tsvector. We can only say that the phrase operator might match, not that it does match; and therefore its NOT also might match. The previous coding incorrectly inverted the approximate phrase result to decide that there was certainly no match. To fix, we need to make TS_phrase_execute return a real ternary result, and then bubble that up accurately in TS_execute. As long as we have to do that anyway, we can simplify the baroque things TS_phrase_execute was doing internally to manage tri-valued searching with only a bool as explicit result. For now, I left the externally-visible result of TS_execute as a plain bool. There do not appear to be any outside callers that need to distinguish a three-way result, given that they passed in a flag saying what to do in the absence of position data. This might need to change someday, but we wouldn't want to back-patch such a change. Although tsginidx.c has its own TS_execute_ternary implementation for use at upper index levels, that sadly managed to get this case wrong as well :-(. Fixing it is a lot easier fortunately. Per bug #16388 from Charles Offenbacher. Back-patch to 9.6 where phrase search was introduced. Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org
2020-04-20Allow pg_read_all_stats to access all stats views againMagnus Hagander
The views pg_stat_progress_* had not gotten the memo that pg_read_all_stats is supposed to be able to read all statistics. Also make a pass over all text-returning pg_stat_xyz functions that could return "insufficient privilege" and make sure they also respect pg_read_all_status. Reported-by: Andrey M. Borodin Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
2020-04-07Fix circle_in to accept "(x,y),r" as it's advertised to do.Tom Lane
Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
2020-04-07Adjust bytea get_bit/set_bit to cope with bytea strings > 256MB.Tom Lane
Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. However, it'd be good if they could do at least that much, and not fall over entirely for longer bytea values. Adjust the comparisons to be done in int64 arithmetic so that works. Also tweak the error reports to show sane values in case of overflow. Also add some test cases to improve the miserable code coverage of these functions. Apply patch to back branches only; HEAD has a better solution as of commit 26a944cf2. Extracted from a much larger patch by Movead Li Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-03-31Teach pg_ls_dir_files() to ignore ENOENT failures from stat().Tom Lane
Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until 085b6b667 added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-16Avoid holding a directory FD open across assorted SRF calls.Tom Lane
This extends the fixes made in commit 085b6b667 to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-11Avoid holding a directory FD open across pg_ls_dir_files() calls.Tom Lane
This coding technique is undesirable because (a) it leaks the FD for the rest of the transaction if the SRF is not run to completion, and (b) allocated FDs are a scarce resource, but multiple interleaved uses of the relevant functions could eat many such FDs. In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1" yields a warning about the leaked FD, and the only reason there's no warning in earlier branches is that fd.c didn't whine about such leaks before commit 9cb7db3f0. Even disregarding the warning, it wouldn't be too hard to run a backend out of FDs with careless use of these SQL functions. Hence, rewrite the function so that it reads the directory within a single call, returning the results as a tuplestore rather than via value-per-call mode. There are half a dozen other built-in SRFs with similar problems, but let's fix this one to start with, just to see if the buildfarm finds anything wrong with the code. In passing, fix bogus error report for stat() failure: it was whining about the directory when it should be fingering the individual file. Doubtless a copy-and-paste error. Back-patch to v10 where this function was added. Justin Pryzby, with cosmetic tweaks and test cases by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-02-13Avoid a performance regression in float overflow/underflow detection.Tom Lane
Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static inline subroutines, but that wasn't too well thought out. In the original coding, the unlikely condition (isinf(result) or result == 0) was checked first, and the inf_is_valid or zero_is_valid condition only afterwards. The inline-subroutine coding caused that to be swapped around, which is pretty horrid for performance because (a) in common cases the is_valid condition is twice as expensive to evaluate (e.g., requiring two isinf() calls not one) and (b) in common cases the is_valid condition is false, requiring us to perform the unlikely-condition check anyway. Net result is that one isinf() call becomes two or three, resulting in visible performance loss as reported by Keisuke Kuroda. The original fix proposal was to revert the replacement of the macro, but on second thought, that macro was just a bad idea from the beginning: if anything it's a net negative for readability of the code. So instead, let's just open-code all the overflow/underflow tests, being careful to test the unlikely condition first (and mark it unlikely() to help the compiler get the point). Also, rather than having N copies of the actual ereport() calls, collapse those into out-of-line error subroutines to save some code space. This does mean that the error file/line numbers won't be very helpful for figuring out where the issue really is --- but we'd already burned that bridge by putting the ereports into static inlines. In HEAD, check_float[48]_val() are gone altogether. In v12, leave them present in float.h but unused in the core code, just in case some extension is depending on them. Emre Hasegeli, with some kibitzing from me and Andres Freund Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
2020-02-07Fix bug in Tid scan.Fujii Masao
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan() and uses the scan option for seq scan. This change caused two issues. (1) The change caused Tid scan to take a predicate lock on the entire relation in serializable transaction even when relation-level lock is not necessary. This could lead to an unexpected serialization error. (2) The change caused Tid scan to increment the number of seq_scan in pg_stat_*_tables views even though it's not seq scan. This could confuse the users. This commit adds the scan option for Tid scan and makes Tid scan use it, to avoid those issues. Back-patch to v12, where the bug was introduced. Author: Tatsuhito Kasahara Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
2020-01-23Fix an oversight in commit 4c70098ff.Tom Lane
I had supposed that the from_char_seq_search() call sites were all passing the constant arrays you'd expect them to pass ... but on looking closer, the one for DY format was passing the days[] array not days_short[]. This accidentally worked because the day abbreviations in English are all the same as the first three letters of the full day names. However, once we took out the "maximum comparison length" logic, it stopped working. As penance for that oversight, add regression test cases covering this, as well as every other switch case in DCH_from_char() that was not reached according to the code coverage report. Also, fold the DCH_RM and DCH_rm cases into one --- now that seq_search is case independent, there's no need to pass different comparison arrays for those cases. Back-patch, as the previous commit was.
2020-01-23Clean up formatting.c's logic for matching constant strings.Tom Lane
seq_search(), which is used to match input substrings to constants such as month and day names, had a lot of bizarre and unnecessary behaviors. It was mostly possible to avert our eyes from that before, but we don't want to duplicate those behaviors in the upcoming patch to allow recognition of non-English month and day names. So it's time to clean this up. In particular: * seq_search scribbled on the input string, which is a pretty dangerous thing to do, especially in the badly underdocumented way it was done here. Fortunately the input string is a temporary copy, but that was being made three subroutine levels away, making it something easy to break accidentally. The behavior is externally visible nonetheless, in the form of odd case-folding in error reports about unrecognized month/day names. The scribbling is evidently being done to save a few calls to pg_tolower, but that's such a cheap function (at least for ASCII data) that it's pretty pointless to worry about. In HEAD I switched it to be pg_ascii_tolower to ensure it is cheap in all cases; but there are corner cases in Turkish where this'd change behavior, so leave it as pg_tolower in the back branches. * seq_search insisted on knowing the case form (all-upper, all-lower, or initcap) of the constant strings, so that it didn't have to case-fold them to perform case-insensitive comparisons. This likewise seems like excessive micro-optimization, given that pg_tolower is certainly very cheap for ASCII data. It seems unsafe to assume that we know the case form that will come out of pg_locale.c for localized month/day names, so it's better just to define the comparison rule as "downcase all strings before comparing". (The choice between downcasing and upcasing is arbitrary so far as English is concerned, but it might not be in other locales, so follow citext's lead here.) * seq_search also had a parameter that'd cause it to report a match after a maximum number of characters, even if the constant string were longer than that. This was not actually used because no caller passed a value small enough to cut off a comparison. Replicating that behavior for localized month/day names seems expensive as well as useless, so let's get rid of that too. * from_char_seq_search used the maximum-length parameter to truncate the input string in error reports about not finding a matching name. This leads to rather confusing reports in many cases. Worse, it is outright dangerous if the input string isn't all-ASCII, because we risk truncating the string in the middle of a multibyte character. That'd lead either to delivering an illegible error message to the client, or to encoding-conversion failures that obscure the actual data problem. Get rid of that in favor of truncating at whitespace if any (a suggestion due to Alvaro Herrera). In addition to fixing these things, I const-ified the input string pointers of DCH_from_char and its subroutines, to make sure there aren't any other scribbling-on-input problems. The risk of generating a badly-encoded error message seems like enough of a bug to justify back-patching, so patch all supported branches. Discussion: https://postgr.es/m/29432.1579731087@sss.pgh.pa.us
2020-01-12Fix edge-case crashes and misestimation in range containment selectivity.Tom Lane
When estimating the selectivity of "range_var <@ range_constant" or "range_var @> range_constant", if the upper (or respectively lower) bound of the range_constant was above the last bin of the range_var's histogram, the code would access uninitialized memory and potentially crash (though it seems the probability of a crash is quite low). Handle the endpoint cases explicitly to fix that. While at it, be more paranoid about the possibility of getting NaN or other silly results from the range type's subdiff function. And improve some comments. Ordinarily we'd probably add a regression test case demonstrating the bug in unpatched code. But it's too hard to get it to crash reliably because of the uninitialized-memory dependence, so skip that. Per bug #16122 from Adam Scott. It's been broken from the beginning, apparently, so backpatch to all supported branches. Diagnosis by Michael Paquier, patch by Andrey Borodin and Tom Lane. Discussion: https://postgr.es/m/16122-eb35bc248c806c15@postgresql.org
2020-01-08Reimplement nullification of walsender timestampAlvaro Herrera
Make the value null only at pg_stat_activity-output time, as suggested by Tom Lane, instead of messing with the internal state. This should appease buildfarm members with force_parallel_mode=regress, which are running parallel queries on logical replication walsenders. The fact that walsenders can run parallel queries should perhaps be studied more carefully, but for the moment let's get rid of the red blots in buildfarm. Backpatch to pg10, like the previous commit. Discussion: https://postgr.es/m/30804.1578438763@sss.pgh.pa.us
2019-12-12Fix EXTRACT(ISOYEAR FROM timestamp) for years BC.Tom Lane
The test cases added by commit 26ae3aa80 exposed an old oversight in timestamp[tz]_part: they didn't correct the result of date2isoyear() for BC years, so that we produced an off-by-one answer for such years. Fix that, and back-patch to all supported branches. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
2019-12-12Remove redundant function calls in timestamp[tz]_part().Tom Lane
The DTK_DOW/DTK_ISODOW and DTK_DOY switch cases in timestamp_part() and timestamptz_part() contained calls of timestamp2tm() that were fully redundant with the ones done just above the switch. This evidently crept in during commit 258ee1b63, which relocated that code from another place where the calls were indeed needed. Just delete the redundant calls. I (tgl) noted that our test coverage of these functions left quite a bit to be desired, so extend timestamp.sql and timestamptz.sql to cover all the branches. Back-patch to all supported branches, as the previous commit was. There's no real issue here other than some wasted cycles in some not-too-heavily-used code paths, but the test coverage seems valuable. Report and patch by Li Japin; test case adjustments by me. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
2019-11-26Allow access to child table statistics if user can read parent table.Tom Lane
The fix for CVE-2017-7484 disallowed use of pg_statistic data for planning purposes if the user would not be able to select the associated column and a non-leakproof function is to be applied to the statistics values. That turns out to disable use of pg_statistic data in some common cases involving inheritance/partitioning, where the user does have permission to select from the parent table that was actually named in the query, but not from a child table whose stats are needed. Since, in non-corner cases, the user *can* select the child table's data via the parent, this restriction is not actually useful from a security standpoint. Improve the logic so that we also check the permissions of the originally-named table, and allow access if select permission exists for that. When checking access to stats for a simple child column, we can map the child column number back to the parent, and perform this test exactly (including not allowing access if the child column isn't exposed by the parent). For expression indexes, the current logic just insists on whole-table select access, and this patch allows access if the user can select the whole parent table. In principle, if the child table has extra columns, this might allow access to stats on columns the user can't read. In practice, it's unlikely that the planner is going to do any stats calculations involving expressions that are not visible to the query, so we'll ignore that fine point for now. Perhaps someday we'll improve that logic to detect exactly which columns are used by an expression index ... but today is not that day. Back-patch to v11. The issue was created in 9.2 and up by the CVE-2017-7484 fix, but this patch depends on the append_rel_array[] planner data structure which only exists in v11 and up. In practice the issue is most urgent with partitioned tables, so fixing v11 and later should satisfy much of the practical need. Dilip Kumar and Amit Langote, with some kibitzing by me Discussion: https://postgr.es/m/3876.1531261875@sss.pgh.pa.us
2019-11-21Defend against self-referential views in relation_is_updatable().Tom Lane
While a self-referential view doesn't actually work, it's possible to create one, and it turns out that this breaks some of the information_schema views. Those views call relation_is_updatable(), which neglected to consider the hazards of being recursive. In older PG versions you get a "stack depth limit exceeded" error, but since v10 it'd recurse to the point of stack overrun and crash, because commit a4c35ea1c took out the expression_returns_set() call that was incidentally checking the stack depth. Since this function is only used by information_schema views, it seems like it'd be better to return "not updatable" than suffer an error. Hence, add tracking of what views we're examining, in just the same way that the nearby fireRIRrules() code detects self-referential views. I added a check_stack_depth() call too, just to be defensive. Per private report from Manuel Rigger. Back-patch to all supported versions.
2019-11-21Provide statistics for hypothetical BRIN indexesMichael Paquier
Trying to use hypothetical indexes with BRIN currently fails when trying to access a relation that does not exist when looking for the statistics. With the current API, it is not possible to easily pass a value for pages_per_range down to the hypothetical index, so this makes use of the default value of BRIN_DEFAULT_PAGES_PER_RANGE, which should be fine enough in most cases. Being able to refine or enforce the hypothetical costs in more optimistic ways would require more refactoring by filling in the statistics when building IndexOptInfo in plancat.c. This would involve ABI breakages around the costing routines, something not fit for stable branches. This is broken since 7e534ad, so backpatch down to v10. Author: Julien Rouhaud, Heikki Linnakangas Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAOBaU_ZH0LKEA8VFCocr6Lpte1ab0b6FpvgS0y4way+RPSXfYg@mail.gmail.com Backpatch-through: 10
2019-11-19Fix corner-case failure in match_pattern_prefix().Tom Lane
The planner's optimization code for LIKE and regex operators could error out with a complaint like "no = operator for opfamily NNN" if someone created a binary-compatible index (for example, a bpchar_ops index on a text column) on the LIKE's left argument. This is a consequence of careless refactoring in commit 74dfe58a5. The old code in match_special_index_operator only accepted specific combinations of the pattern operator and the index opclass, thereby indirectly guaranteeing that the opclass would have a comparison operator with the same LHS input type as the pattern operator. While moving the logic out to a planner support function, I simplified that test in a way that no longer guarantees that. Really though we'd like an altogether weaker dependency on the opclass, so rather than put back exactly the old code, just allow lookup failure. I have in mind now to rewrite this logic completely, but this is the minimum change needed to fix the bug in v12. Per report from Manuel Rigger. Back-patch to v12 where the mistake came in. Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
2019-11-16Further fix dumping of views that contain just VALUES(...).Tom Lane
It turns out that commit e9f1c01b7 missed a case: we must print a VALUES clause in long format if get_query_def is given a resultDesc that would require the query's output column name(s) to be different from what the bare VALUES clause would produce. This applies in case an ALTER ... RENAME COLUMN has been done to a view that formerly could be printed in simple format, as shown in the added regression test case. It also explains bug #16119 from Dmitry Telpt, because it turns out that (unlike CREATE VIEW) CREATE MATERIALIZED VIEW fails to apply any column aliases it's given to the stored ON SELECT rule. So to get them to be printed, we have to account for the resultDesc renaming. It might be worth changing the matview code so that it creates the ON SELECT rule with the correct aliases; but we'd still need these messy checks in get_simple_values_rte to handle the case of a subsequent column rename, so any such change would be just neatnik-ism not a bug fix. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/16119-e64823f30a45a754@postgresql.org
2019-11-16Skip system attributes when applying mvdistinct statsTomas Vondra
When estimating number of distinct groups, we failed to ignore system attributes when matching the group expressions to mvdistinct stats, causing failures like ERROR: negative bitmapset member not allowed Fix that by simply skipping anything that is not a regular attribute. Backpatch to PostgreSQL 10, where the extended stats were introduced. Bug: #16111 Reported-by: Tuomas Leikola Author: Tomas Vondra Backpatch-through: 10 Discussion: https://postgr.es/m/16111-687799584c3a7e73@postgresql.org
2019-11-13Add missing check_collation_set call to bpcharne().Tom Lane
We should throw an error for indeterminate collation, but bpcharne() was missing that logic, resulting in a much less user-friendly error (either an assertion failure or "cache lookup failed for collation 0"). Per report from Manuel Rigger. Back-patch to v12 where the mistake came in, evidently in commit 5e1963fb7. (Before non-deterministic collations, this function wasn't collation sensitive.) Discussion: https://postgr.es/m/CA+u7OA4HOjtymxAbuGNh4-X_2R0Lw5n01tzvP8E5-i-2gQXYWA@mail.gmail.com
2019-11-07Fix integer-overflow edge case detection in interval_mul and pgbench.Tom Lane
This patch adopts the overflow check logic introduced by commit cbdb8b4c0 into two more places. interval_mul() failed to notice if it computed a new microseconds value that was one more than INT64_MAX, and pgbench's double-to-int64 logic had the same sorts of edge-case problems that cbdb8b4c0 fixed in the core code. To make this easier to get right in future, put the guts of the checks into new macros in c.h, and add commentary about how to use the macros correctly. Back-patch to all supported branches, as we did with the previous fix. Yuya Watari Discussion: https://postgr.es/m/CAJ2pMkbkkFw2hb9Qb1Zj8d06EhWAQXFLy73St4qWv6aX=vqnjw@mail.gmail.com
2019-11-04Catch invalid typlens in a couple of placesPeter Eisentraut
Rearrange the logic in record_image_cmp() and datum_image_eq() to error out on unexpected typlens (either not supported there or completely invalid due to corruption). Barring corruption, this is not possible today but it seems more future-proof and robust to fix this. Reported-by: Peter Geoghegan <pg@bowt.ie>
2019-10-28Handle empty-string edge cases correctly in strpos().Tom Lane
Commit 9556aa01c rearranged the innards of text_position() in a way that would make it not work for empty search strings. Which is fine, because all callers of that code special-case an empty pattern in some way. However, the primary use-case (text_position itself) got special-cased incorrectly: historically it's returned 1 not 0 for an empty search string. Restore the historical behavior. Per complaint from Austin Drenski (via Shay Rojansky). Back-patch to v12 where it got broken. Discussion: https://postgr.es/m/CADT4RqAz7oN4vkPir86Kg1_mQBmBxCp-L_=9vRpgSNPJf0KRkw@mail.gmail.com
2019-10-04Fix bitshiftright()'s zero-padding some more.Tom Lane
Commit 5ac0d9360 failed to entirely fix bitshiftright's habit of leaving one-bits in the pad space that should be all zeroes, because in a moment of sheer brain fade I'd concluded that only the code path used for not-a-multiple-of-8 shift distances needed to be fixed. Of course, a multiple-of-8 shift distance can also cause the problem, so we need to forcibly zero the extra bits in both cases. Per bug #16037 from Alexander Lakhin. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/16037-1d1ebca564db54f4@postgresql.org
2019-09-23Message style fixesPeter Eisentraut
2019-09-22Fix failure to zero-pad the result of bitshiftright().Tom Lane
If the bitstring length is not a multiple of 8, we'd shift the rightmost bits into the pad space, which must be zeroes --- bit_cmp, for one, depends on that. This'd lead to the result failing to compare equal to what it should compare equal to, as reported in bug #16013 from Daryl Waycott. This is, if memory serves, not the first such bug in the bitstring functions. In hopes of making it the last one, do a bit more work than minimally necessary to fix the bug: * Add assertion checks to bit_out() and varbit_out() to complain if they are given incorrectly-padded input. This will improve the odds that manual testing of any new patch finds problems. * Encapsulate the padding-related logic in macros to make it easier to use. Also, remove unnecessary padding logic from bit_or() and bitxor(). Somebody had already noted that we need not re-pad the result of bit_and() since the inputs are required to be the same length, but failed to extrapolate that to the other two. Also, move a comment block that once was near the head of varbit.c (but people kept putting other stuff in front of it), to put it in the header block. Note for the release notes: if anyone has inconsistent data as a result of saving the output of bitshiftright() in a table, it's possible to fix it with something like UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol); This has been broken since day one, so back-patch to all supported branches. Discussion: https://postgr.es/m/16013-c2765b6996aacae9@postgresql.org
2019-09-21Straighten out leakproofness markings on text comparison functions.Tom Lane
Since we introduced the idea of leakproof functions, texteq and textne were marked leakproof but their sibling text comparison functions were not. This inconsistency seemed justified because texteq/textne just relied on memcmp() and so could easily be seen to be leakproof, while the other comparison functions are far more complex and indeed can throw input-dependent errors. However, that argument crashed and burned with the addition of nondeterministic collations, because now texteq/textne may invoke the exact same varstr_cmp() infrastructure as the rest. It makes no sense whatever to give them different leakproofness markings. After a certain amount of angst we've concluded that it's all right to consider varstr_cmp() to be leakproof, mostly because the other choice would be disastrous for performance of many queries where leakproofness matters. The input-dependent errors should only be reachable for corrupt input data, or so we hope anyway; certainly, if they are reachable in practice, we've got problems with requirements as basic as maintaining a btree index on a text column. Hence, run around to all the SQL functions that derive from varstr_cmp() and mark them leakproof. This should result in a useful gain in flexibility/performance for queries in which non-leakproofness degrades the efficiency of the query plan. Back-patch to v12 where nondeterministic collations were added. While this isn't an essential bug fix given the determination that varstr_cmp() is leakproof, we might as well apply it now that we've been forced into a post-beta4 catversion bump. Discussion: https://postgr.es/m/31481.1568303470@sss.pgh.pa.us