summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2016-03-28Code and docs review for commit 3187d6de0e5a9e805b27c48437897e8c39071d45.Tom Lane
Fix up check for high-bit-set characters, which provoked "comparison is always true due to limited range of data type" warnings on some compilers, and was unlike the way we do it elsewhere anyway. Fix omission of "$" from the set of valid identifier continuation characters. Get rid of sanitize_text(), which was utterly inconsistent with any other error report anywhere in the system, and wasn't even well designed on its own terms (double-quoting the result string without escaping contained double quotes doesn't seem very well thought out). Fix up error messages, which didn't follow the message style guidelines very well, and were overly specific in situations where the actual mistake might not be what they said. Improve documentation. (I started out just intending to fix the compiler warning, but the more I looked at the patch the less I liked it.)
2016-03-27Guard against zero vardata.rel->tuples in estimate_hash_bucketsize().Tom Lane
If the referenced rel was proven empty, we'd compute 0/0 here, which results in the function returning NaN. That's a bit more serious than the other zero-divide case. Still, it only seems to be possible in HEAD, so no back-patch. Per report from Piotr Stefaniak. I looked through the rest of selfuncs.c and found no other likely trouble spots.
2016-03-27Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize().Tom Lane
This avoids a possible divide-by-zero in the following calculation, and rounding the number to an integer seems like saner behavior anyway. Assuming IEEE math, the division would yield +Infinity which would get replaced by 1.0 at the bottom of the function, so nothing really interesting would ensue; but avoiding divide-by-zero seems like a good idea on general principles. Per report from Piotr Stefaniak. No back-patch since this seems mostly cosmetic.
2016-03-24Use correct GetDatum function.Robert Haas
Oops.
2016-03-23Support CREATE ACCESS METHODAlvaro Herrera
This enables external code to create access methods. This is useful so that extensions can add their own access methods which can be formally tracked for dependencies, so that DROP operates correctly. Also, having explicit support makes pg_dump work correctly. Currently only index AMs are supported, but we expect different types to be added in the future. Authors: Alexander Korotkov, Petr Jelínek Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby Commitfest-URL: https://commitfest.postgresql.org/9/353/ Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
2016-03-23Move keywords.c/kwlookup.c into src/common/.Tom Lane
Now that we have src/common/ for code shared between frontend and backend, we can get rid of (most of) the klugy ways that the keyword table and keyword lookup code were formerly shared between different uses. This is a first step towards a more general plan of getting rid of special-purpose kluges for sharing code in src/bin/. I chose to merge kwlookup.c back into keywords.c, as it once was, and always has been so far as keywords.h is concerned. We could have kept them separate, but there is noplace that uses ScanKeywordLookup without also wanting access to the backend's keyword list, so there seems little point. ecpg is still a bit weird, but at least now the trickiness is documented. I think that the MSVC build script should require no adjustments beyond what's done here ... but we'll soon find out.
2016-03-23Disable abbreviated keys for string-sorting in non-C locales.Robert Haas
Unfortunately, every version of glibc thus far tested has bugs whereby strcoll() ordering does not match strxfrm() ordering as required by the standard. This can result in, for example, corrupted indexes. Disabling abbreviated keys in these cases slows down non-C-collation string sorting considerably, but there seems to be no practical alternative. Users who are confident that their libc implementations are solid in this regard can re-enable the optimization by compiling with TRUST_STRXFRM. Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1 should REINDEX if there is a possibility that they may have been affected by this problem. Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch by me, reviewed by Peter Geoghegan and Tom Lane.
2016-03-23Code review for error reports in jsonb_set().Tom Lane
User-facing (even tested by regression tests) error conditions were thrown with elog(), hence had wrong SQLSTATE and were untranslatable. And the error message texts weren't up to project style, either.
2016-03-23Fix unsafe use of strtol() on a non-null-terminated Text datum.Tom Lane
jsonb_set() could produce wrong answers or incorrect error reports, or in the worst case even crash, when trying to convert a path-array element into an integer for use as an array subscript. Per report from Vitaly Burovoy. Back-patch to 9.5 where the faulty code was introduced (in commit c6947010ceb42143). Michael Paquier
2016-03-18Introduce parse_ident()Teodor Sigaev
SQL-layer function to split qualified identifier into array parts. Author: Pavel Stehule with minor editorization by me and Jim Nasby
2016-03-18Various minor corrections of and improvements to comments.Robert Haas
Aleksander Alekseev
2016-03-17Fix assorted breakage in to_char()'s OF format option.Tom Lane
In HEAD, fix incorrect field width for hours part of OF when tm_gmtoff is negative. This was introduced by commit 2d87eedc1d4468d3 as a result of falsely applying a pattern that's correct when + signs are omitted, which is not the case for OF. In 9.4, fix missing abs() call that allowed a sign to be attached to the minutes part of OF. This was fixed in 9.5 by 9b43d73b3f9bef27, but for inscrutable reasons not back-patched. In all three versions, ensure that the sign of tm_gmtoff is correctly reported even when the GMT offset is less than 1 hour. Add regression tests, which evidently we desperately need here. Thomas Munro and Tom Lane, per report from David Fetter
2016-03-16Fix j2day() to behave sanely for negative Julian dates.Tom Lane
Somebody had apparently once figured that casting to unsigned int would produce the right output for negative inputs, but that would only be true if 2^32 were a multiple of 7, which of course it ain't. We need to use a signed division and then correct the sign of the remainder. AFAICT, the only case where this would arise currently is when doing ISO-week calculations for dates in 4714BC, where we'd compute a negative Julian date representing 4714-01-04BC and then do some arithmetic with it. Since we don't even really document support for such dates, this is not of much consequence. But we may as well get it right. Per report from Vitaly Burovoy.
2016-03-16Be more careful about out-of-range dates and timestamps.Tom Lane
Tighten the semantics of boundary-case timestamptz so that we allow timestamps >= '4714-11-24 00:00+00 BC' and < 'ENDYEAR-01-01 00:00+00 AD' exactly, no more and no less, but it is allowed to enter timestamps within that range using non-GMT timezone offsets (which could make the nominal date 4714-11-23 BC or ENDYEAR-01-01 AD). This eliminates dump/reload failure conditions for timestamps near the endpoints. To do this, separate checking of the inputs for date2j() from the final range check, and allow the Julian date code to handle a range slightly wider than the nominal range of the datatypes. Also add a bunch of checks to detect out-of-range dates and timestamps that formerly could be returned by operations such as date-plus-integer. All C-level functions that return date, timestamp, or timestamptz should now be proof against returning a value that doesn't pass IS_VALID_DATE() or IS_VALID_TIMESTAMP(). Vitaly Burovoy, reviewed by Anastasia Lubennikova, and substantially whacked around by me
2016-03-15Fix typos.Robert Haas
Oskari Saarenmaa
2016-03-12Fix Windows portability issue in 23a27b039d94ba35.Tom Lane
_strtoui64() is available in MSVC builds, but apparently not with other Windows toolchains. Thanks to Petr Jelinek for the diagnosis.
2016-03-12Widen query numbers-of-tuples-processed counters to uint64.Tom Lane
This patch widens SPI_processed, EState's es_processed field, PortalData's portalPos field, FuncCallContext's call_cntr and max_calls fields, ExecutorRun's count argument, PortalRunFetch's result, and the max number of rows in a SPITupleTable to uint64, and deals with (I hope) all the ensuing fallout. Some of these values were declared uint32 before, and others "long". I also removed PortalData's posOverflow field, since that logic seems pretty useless given that portalPos is now always 64 bits. The user-visible results are that command tags for SELECT etc will correctly report tuple counts larger than 4G, as will plpgsql's GET GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples than that are still not exactly the norm, but they're becoming more common. Most values associated with FETCH/MOVE distances, such as PortalRun's count argument and the count argument of most SPI functions that have one, remain declared as "long". It's not clear whether it would be worth promoting those to int64; but it would definitely be a large dollop of additional API churn on top of this, and it would only help 32-bit platforms which seem relatively less likely to see any benefit. Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-11Fix Windows build broken in 6943a946c7e5eb72d53c0ce71f08a81a133503bdTeodor Sigaev
Also it fixes dynamic array allocation disallowed by ANSI-C. Author: Stas Kelvich
2016-03-11Tsvector editing functionsTeodor Sigaev
Adds several tsvector editting function: convert tsvector to/from text array, set weight for given lexemes, delete lexeme(s), unnest, filter lexemes with given weights Author: Stas Kelvich with some editorization by me Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-10Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.Tom Lane
All along, this function should have treated WindowFuncs in a manner similar to Aggrefs, ie with an option whether or not to recurse into them. By not considering the case, it was always recursing, which is OK for most callers (although I suspect that the case in prepare_sort_from_pathkeys might represent a bug). But now we need return-without-recursing behavior as well. There are also more than a few callers that should never see a WindowFunc, and now we'll get some error checking on that.
2016-03-10Refactor pull_var_clause's API to make it less tedious to extend.Tom Lane
In commit 1d97c19a0f748e94 and later c1d9579dd8bf3c92, we extended pull_var_clause's API by adding enum-type arguments. That's sort of a pain to maintain, though, because it means every time we add a new behavior we must touch every last one of the call sites, even if there's a reasonable default behavior that most of them could use. Let's switch over to using a bitmask of flags, instead; that seems more maintainable and might save a nanosecond or two as well. This commit changes no behavior in itself, though I'm going to follow it up with one that does add a new behavior. In passing, remove flatten_tlist(), which has not been used since 9.1 and would otherwise need the same API changes. Removing these enums means that optimizer/tlist.h no longer needs to depend on optimizer/var.h. Changing that caused a number of C files to need addition of #include "optimizer/var.h" (probably we can thank old runs of pgrminclude for that); but on balance it seems like a good change anyway.
2016-03-10Provide much better wait information in pg_stat_activity.Robert Haas
When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.
2016-03-10Code review for b6fb6471f6afaf649e52f38269fd8c5c60647669.Robert Haas
Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev. Patch by Amit Langote.
2016-03-09Add a generic command progress reporting facility.Robert Haas
Using this facility, any utility command can report the target relation upon which it is operating, if there is one, and up to 10 64-bit counters; the intent of this is that users should be able to figure out what a utility command is doing without having to resort to ugly hacks like attaching strace to a backend. As a demonstration, this adds very crude reporting to lazy vacuum; we just report the target relation and nothing else. A forthcoming patch will make VACUUM report a bunch of additional data that will make this much more interesting. But this gets the basic framework in place. Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, and Masanori Oyama.
2016-03-04Fix typo in comment.Robert Haas
Thomas Munro
2016-03-02Fix json_to_record() bug with nested objects.Tom Lane
A thinko concerning nesting depth caused json_to_record() to produce bogus output if a field of its input object contained a sub-object with a field name matching one of the requested output column names. Per bug #13996 from Johann Visagie. I added a regression test case based on his example, plus parallel tests for json_to_recordset, jsonb_to_record, jsonb_to_recordset. The latter three do not exhibit the same bug (which suggests that we may be missing some opportunities to share code...) but testing seems like a good idea in any case. Back-patch to 9.4 where these functions were introduced.
2016-03-02Create stub functions to support pg_upgrade of old contrib/tsearch2.Tom Lane
Commits 9ff60273e35cad6e and dbe2328959e12701 adjusted the declarations of some core functions referenced by contrib/tsearch2's install script, forgetting that in a pg_upgrade situation, we'll be trying to restore operator class definitions that reference the old signatures. We've hit this problem before; solve it in the same way as before, namely by installing stub functions that have the expected signature and just invoke the correct function. Per report from Jeff Janes. (Someday we ought to stop supporting contrib/tsearch2, but I'm not sure today is that day.)
2016-02-28Avoid multiple free_struct_lconv() calls on same data.Tom Lane
A failure partway through PGLC_localeconv() led to a situation where the next call would call free_struct_lconv() a second time, leading to free() on already-freed strings, typically leading to a core dump. Add a flag to remember whether we need to do that. Per report from Thom Brown. His example case only provokes the failure as far back as 9.4, but nonetheless this code is obviously broken, so back-patch to all supported branches.
2016-02-22Create a function to reliably identify which sessions block which others.Tom Lane
This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-21Fix two-argument jsonb_object when called with empty arraysAndrew Dunstan
Some over-eager copy-and-pasting on my part resulted in a nonsense result being returned in this case. I have adopted the same pattern for handling this case as is used in the one argument form of the function, i.e. we just skip over the code that adds values to the object. Diagnosis and patch from Michael Paquier, although not quite his solution. Fixes bug #13936. Backpatch to 9.5 where jsonb_object was introduced.
2016-02-20Further fixing to make pg_size_bytes() portable.Dean Rasheed
Not all compilers support "long long" and the "LL" integer literal suffix, so use a cast to int64 instead.
2016-02-20Fix pg_size_bytes() to be more portable.Dean Rasheed
Commit 53874c5228fe16589a4d01b3e1fab3678e0fd8e3 broke various 32-bit buildfarm machines because it incorrectly used an 'L' suffix for what needed to be a 64-bit literal. Thanks to Michael Paquier for helping to diagnose this.
2016-02-20Add pg_size_bytes() to parse human-readable size strings.Dean Rasheed
This will parse strings in the format produced by pg_size_pretty() and return sizes in bytes. This allows queries to be written with clauses like "pg_total_relation_size(oid) > pg_size_bytes('10 GB')". Author: Pavel Stehule with various improvements by Vitaly Burovoy Discussion: http://www.postgresql.org/message-id/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com Reviewed-by: Vitaly Burovoy, Oleksandr Shulgin, Kyotaro Horiguchi, Michael Paquier and Robert Haas
2016-02-17Reuse abbreviated keys in ordered [set] aggregates.Robert Haas
When processing ordered aggregates following a sort that could make use of the abbreviated key optimization, only call the equality operator to compare successive pairs of tuples when their abbreviated keys were not equal. Peter Geoghegan, reviewd by Andreas Karlsson and by me.
2016-02-11Improve error reporting in format()Teodor Sigaev
Clarify invalid format conversion type error message and add hint. Author: Jim Nasby
2016-02-08Re-pgindent varlena.c.Tom Lane
Just to make sure previous commit worked ...
2016-02-08Rename typedef "string" to "VarString".Tom Lane
Since pgindent treats typedef names as global, the original coding of b47b4dbf683f13e6 would have had rather nasty effects on the formatting of other files in which "string" is used as a variable or field name. Use a less generic name for this typedef, and rename some other identifiers to match. Peter Geoghegan, per gripe from me
2016-02-07Fix deparsing of ON CONFLICT arbiter WHERE clauses.Tom Lane
The parser doesn't allow qualification of column names appearing in these clauses, but ruleutils.c would sometimes qualify them, leading to dump/reload failures. Per bug #13891 from Onder Kalaci. (In passing, make stanzas in ruleutils.c that save/restore varprefix more consistent.) Peter Geoghegan
2016-02-06Improve speed of timestamp/time/date output functions.Tom Lane
It seems that sprintf(), at least in glibc's version, is unreasonably slow compared to hand-rolled code for printing integers. Replacing most uses of sprintf() in the datetime.c output functions with special-purpose code turns out to give more than a 2X speedup in COPY of a table with a single timestamp column; which is pretty impressive considering all the other logic in that code path. David Rowley and Andres Freund, reviewed by Peter Geoghegan and myself
2016-02-05Fix small goof in comment.Robert Haas
Peter Geoghegan
2016-02-04Add num_nulls() and num_nonnulls() to count NULL arguments.Tom Lane
An example use-case is "CHECK(num_nonnulls(a,b,c) = 1)" to assert that exactly one of a,b,c isn't NULL. The functions are variadic, so they can also be pressed into service to count the number of null or nonnull elements in an array. Marko Tiikkaja, reviewed by Pavel Stehule
2016-02-03Extend sortsupport for text to more opclasses.Robert Haas
Have varlena.c expose an interface that allows the char(n), bytea, and bpchar types to piggyback on a now-generalized SortSupport for text. This pushes a little more knowledge of the bpchar/char(n) type into varlena.c than might be preferred, but that seems like the approach that creates least friction. Also speed things up for index builds that use text_pattern_ops or varchar_pattern_ops. This patch does quite a bit of renaming, but it seems likely to be worth it, so as to avoid future confusion about the fact that this code is now more generally used than the old names might have suggested. Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson, with small tweaks by me.
2016-02-03Fix IsValidJsonNumber() to notice trailing non-alphanumeric garbage.Tom Lane
Commit e09996ff8dee3f70 was one brick shy of a load: it didn't insist that the detected JSON number be the whole of the supplied string. This allowed inputs such as "2016-01-01" to be misdetected as valid JSON numbers. Per bug #13906 from Dmitry Ryabov. In passing, be more wary of zero-length input (I'm not sure this can happen given current callers, but better safe than sorry), and do some minor cosmetic cleanup.
2016-01-24Yet further adjust degree-based trig functions for more portability.Tom Lane
Buildfarm member cockatiel is still saying that cosd(60) isn't 0.5. What seems likely is that the subexpression (1.0 - cos(x)) isn't being rounded to double width before more arithmetic is done on it, so force that by storing it into a variable.
2016-01-23Still further adjust degree-based trig functions for more portability.Tom Lane
Indeed, the non-static declaration foreseen in my previous commit message is necessary. Per Noah Misch.
2016-01-23Further adjust degree-based trig functions for more portability.Tom Lane
The last round didn't do it. Per Noah Misch, the problem on at least some machines is that the compiler pre-evaluates trig functions having constant arguments using code slightly different from what will be used at runtime. Therefore, we must prevent the compiler from seeing constant arguments to any of the libm trig functions used in this code. The method used here might still fail if init_degree_constants() gets inlined into the call sites. That probably won't happen given the large number of call sites; but if it does, we could probably fix it by making init_degree_constants() non-static. I'll avoid that till proven necessary, though.
2016-01-23Adjust degree-based trig functions for more portability.Tom Lane
The buildfarm isn't very happy with the results of commit e1bd684a34c11139. To try to get the expected exact results everywhere: * Replace M_PI / 180 subexpressions with a precomputed constant, so that the compiler can't decide to rearrange that division with an adjacent operation. Hopefully this will fix failures to get exactly 0.5 from sind(30) and cosd(60). * Add scaling to ensure that tand(45) and cotd(45) give exactly 1; there was nothing particularly guaranteeing that before. * Replace minus zero by zero when tand() or cotd() would output that; many machines did so for tand(180) and cotd(270), but not all. We could alternatively deem both results valid, but that doesn't seem likely to be what users will want.
2016-01-22Add trigonometric functions that work in degrees.Tom Lane
The implementations go to some lengths to deliver exact results for values where an exact result can be expected, such as sind(30) = 0.5 exactly. Dean Rasheed, reviewed by Michael Paquier
2016-01-22Improve cross-platform consistency of Inf/NaN handling in trig functions.Tom Lane
Ensure that the trig functions return NaN for NaN input regardless of what the underlying C library functions might do. Also ensure that an error is thrown for Inf (or otherwise out-of-range) input, except for atan/atan2 which should accept it. All these behaviors should now conform to the POSIX spec; previously, all our popular platforms deviated from that in one case or another. The main remaining platform dependency here is whether the C library might choose to throw a domain error for sin/cos/tan inputs that are large but less than infinity. (Doing so is not unreasonable, since once a single unit-in-the-last-place exceeds PI, there can be no significance at all in the result; however there doesn't seem to be any suggestion in POSIX that such an error is allowed.) We will report such errors if they are reported via "errno", but not if they are reported via "fetestexcept" which is the other mechanism sanctioned by POSIX. Some preliminary experiments with fetestexcept indicated that it might also report errors we could do without, such as complaining about underflow at an unreasonably large threshold. So let's skip that complexity for now. Dean Rasheed, reviewed by Michael Paquier
2016-01-22Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.Tom Lane
Commit e529cd4ffa605c6f introduced an Assert requiring NAMEDATALEN to be less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time. Since up to that instant we had always allowed NAMEDATALEN to be substantially more than that, this was ill-advised. It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be so tight; but this patch takes the narrower approach of just not applying the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser. Trusting the parser for this seems reasonable, first because the strings are limited to NAMEDATALEN which is unlikely to be hugely more than 256, and second because the maximum distance is tightly constrained by MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one place). That means the cost is not really O(mn) but more like O(max(m,n)). Relaxing the limit for user-supplied calls is left for future research; given the lack of complaints to date, it doesn't seem very high priority. In passing, fix confusion between lengths-in-bytes and lengths-in-chars in comments and error messages. Per gripe from Kevin Day; solution suggested by Robert Haas. Back-patch to 9.5 where the unwanted restriction was introduced.