summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2013-08-09Message punctuation and pluralization fixesPeter Eisentraut
2013-08-07Message style improvementsPeter Eisentraut
2013-08-03Make sure float4in/float8in accept all standard spellings of "infinity".Tom Lane
The C99 and POSIX standards require strtod() to accept all these spellings (case-insensitively): "inf", "+inf", "-inf", "infinity", "+infinity", "-infinity". However, pre-C99 systems might accept only some or none of these, and apparently Windows still doesn't accept "inf". To avoid surprising cross-platform behavioral differences, manually check for each of these spellings if strtod() fails. We were previously handling just "infinity" and "-infinity" that way, but since C99 is most of the world now, it seems likely that applications are expecting all these spellings to work. Per bug #8355 from Basil Peace. It turns out this fix won't actually resolve his problem, because Python isn't being this careful; but that doesn't mean we shouldn't be.
2013-07-31Fix regexp_matches() handling of zero-length matches.Tom Lane
We'd find the same match twice if it was of zero length and not immediately adjacent to the previous match. replace_text_regexp() got similar cases right, so adjust this search logic to match that. Note that even though the regexp_split_to_xxx() functions share this code, they did not display equivalent misbehavior, because the second match would be considered degenerate and ignored. Jeevan Chalke, with some cosmetic changes by me.
2013-07-29Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)Greg Stark
Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
2013-07-25Change currtid functions to use an MVCC snapshot, not SnapshotNow.Robert Haas
This has a slight performance cost, but the only known consumers of these functions, known at the SQL level as currtid and currtid2, is pgsql-odbc; whose usage, we hope, is not sufficiently intensive to make this a problem. Per discussion.
2013-07-25Don't use SnapshotNow in get_actual_variable_range.Robert Haas
Instead, use the active snapshot. Per Tom Lane, this function is most interested in knowing the range of tuples our scan will actually see. This is another step towards full removal of SnapshotNow.
2013-07-24Fix booltestsel() for case where we have NULL stats but not MCV stats.Tom Lane
In a boolean column that contains mostly nulls, ANALYZE might not find enough non-null values to populate the most-common-values stats, but it would still create a pg_statistic entry with stanullfrac set. The logic in booltestsel() for this situation did the wrong thing for "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null values would satisfy these tests (so that the true selectivity would be close to one, not close to zero). Per bug #8274. Fix by Andrew Gierth, some comment-smithing by me.
2013-07-23Move strip_implicit_coercions() from optimizer to nodeFuncs.c.Tom Lane
Use of this function has spread into the parser and rewriter, so it seems like time to pull it out of the optimizer and put it into the more central nodeFuncs module. This eliminates the need to #include optimizer/clauses.h in most of the calling files, demonstrating that this function was indeed a bit outside the normal code reference patterns.
2013-07-23Further hacking on ruleutils' new column-alias-assignment code.Tom Lane
After further thought about implicit coercions appearing in a joinaliasvars list, I realized that they represent an additional reason why we might need to reference the join output column directly instead of referencing an underlying column. Consider SELECT x FROM t1 LEFT JOIN t2 USING (x) where t1.x is of type date while t2.x is of type timestamptz. The merged output variable is of type timestamptz, but it won't go to null when t2 does, therefore neither t1.x nor t2.x is a valid substitute reference. The code in get_variable() actually gets this case right, since it knows it shouldn't look through a coercion, but we failed to ensure that the unqualified output column name would be globally unique. To fix, modify the code that trawls for a dangerous situation so that it actually scans through an unnamed join's joinaliasvars list to see if there are any non-simple-Var entries.
2013-07-23Change post-rewriter representation of dropped columns in joinaliasvars.Tom Lane
It's possible to drop a column from an input table of a JOIN clause in a view, if that column is nowhere actually referenced in the view. But it will still be there in the JOIN clause's joinaliasvars list. We used to replace such entries with NULL Const nodes, which is handy for generation of RowExpr expansion of a whole-row reference to the view. The trouble with that is that it can't be distinguished from the situation after subquery pull-up of a constant subquery output expression below the JOIN. Instead, replace such joinaliasvars with null pointers (empty expression trees), which can't be confused with pulled-up expressions. expandRTE() still emits the old convention, though, for convenience of RowExpr generation and to reduce the risk of breaking extension code. In HEAD and 9.3, this patch also fixes a problem with some new code in ruleutils.c that was failing to cope with implicitly-casted joinaliasvars entries, as per recent report from Feike Steenbergen. That oversight was because of an inadequate description of the data structure in parsenodes.h, which I've now corrected. There were some pre-existing oversights of the same ilk elsewhere, which I believe are now all fixed.
2013-07-22Silence compiler warning on an unused variableAlvaro Herrera
Also, tweak wording in comments (per Andres) and documentation (myself) to point out that it's the database's default tablespace that can be passed as 0, not DEFAULTTABLESPACE_OID. Robert Haas noticed the bug in the code, but didn't update the accompanying prose.
2013-07-22Add infrastructure for mapping relfilenodes to relation OIDs.Robert Haas
Future patches are expected to introduce logical replication that works by decoding WAL. WAL contains relfilenodes rather than relation OIDs, so this infrastructure will be needed to find the relation OID based on WAL contents. If logical replication does not make it into this release, we probably should consider reverting this, since it will add some overhead to DDL operations that create new relations. One additional index insert per pg_class row is not a large overhead, but it's more than zero. Another way of meeting the needs of logical replication would be to the relation OID to WAL, but that would burden DML operations, not only DDL. Andres Freund, with some changes by me. Design review, in earlier versions, by Álvaro Herrera.
2013-07-20Clean up new JSON API typedefsPeter Eisentraut
The new JSON API uses a bit of an unusual typedef scheme, where for example OkeysState is a pointer to okeysState. And that's not applied consistently either. Change that to the more usual PostgreSQL style where struct typedefs are upper case, and use pointers explicitly.
2013-07-18Move checking an explicit VARIADIC "any" argument into the parser.Andrew Dunstan
This is more efficient and simpler . It does mean that an untyped NULL can no longer be used in such cases, which should be mentioned in Release Notes, but doesn't seem a terrible loss. The workaround is to cast the NULL to some array type. Pavel Stehule, reviewed by Jeevan Chalke.
2013-07-17Fix end-of-loop optimization in pglz_find_match() function.Heikki Linnakangas
After the recent pglz optimization patch, the next/prev pointers in the hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid entries instead. The end-of-loop check in pglz_find_match() function didn't get the memo. The result was the same from a correctness point of view, but because the NULL-check would never fail, the tiny optimization turned into a pessimization. Reported by Stephen Frost, using Coverity scanner.
2013-07-16Implement the FILTER clause for aggregate function calls.Noah Misch
This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.
2013-07-08Fix bool abusePeter Eisentraut
path_encode's "closed" argument used to take three values: TRUE, FALSE, or -1, while being of type bool. Replace that with a three-valued enum for more clarity.
2013-07-05Expose the estimation of number of changed tuples since last analyzeMagnus Hagander
This value, now pg_stat_all_tables.n_mod_since_analyze, was already tracked and used by autovacuum, but not exposed to the user. Mark Kirkwood, review by Laurenz Albe
2013-07-04Get rid of pg_class.reltoastidxid.Fujii Masao
Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.
2013-07-02Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.Robert Haas
SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-01Add timezone offset output option to to_char()Bruce Momjian
Add ability for to_char() to output the timezone's UTC offset (OF). We already have the ability to return the timezone abbeviation (TZ/tz). Per request from Andrew Dunstan
2013-07-01Optimize pglz compressor for small inputs.Heikki Linnakangas
The pglz compressor has a significant startup cost, because it has to initialize to zeros the history-tracking hash table. On a 64-bit system, the hash table was 64kB in size. While clearing memory is pretty fast, for very short inputs the relative cost of that was quite large. This patch alleviates that in two ways. First, instead of storing pointers in the hash table, store 16-bit indexes into the hist_entries array. That slashes the size of the hash table to 1/2 or 1/4 of the original, depending on the pointer width. Secondly, adjust the size of the hash table based on input size. For very small inputs, you don't need a large hash table to avoid collisions. Review by Amit Kapila.
2013-06-26Renovate display of non-ASCII messages on Windows.Noah Misch
GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
2013-06-15Use WaitLatch, not pg_usleep, for delaying in pg_sleep().Tom Lane
This avoids platform-dependent behavior wherein pg_sleep() might fail to be interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since there's no reason to wake up once a second any more, we can reduce the power consumption of a sleeping backend a tad. Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger issue than it used to be.
2013-06-12Avoid reading past datum end when parsing JSON.Noah Misch
Several loops in the JSON parser examined a byte in memory just before checking whether its address was in-bounds, so they could read one byte beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so no back-patch.
2013-06-12Improve updatability checking for views and foreign tables.Tom Lane
Extend the FDW API (which we already changed for 9.3) so that an FDW can report whether specific foreign tables are insertable/updatable/deletable. The default assumption continues to be that they're updatable if the relevant executor callback function is supplied by the FDW, but finer granularity is now possible. As a test case, add an "updatable" option to contrib/postgres_fdw. This patch also fixes the information_schema views, which previously did not think that foreign tables were ever updatable, and fixes view_is_auto_updatable() so that a view on a foreign table can be auto-updatable. initdb forced due to changes in information_schema views and the functions they rely on. This is a bit unfortunate to do post-beta1, but if we don't change this now then we'll have another API break for FDWs when we do change it. Dean Rasheed, somewhat editorialized on by Tom Lane
2013-06-12Fix unescaping of JSON Unicode escapes, especially for non-UTF8.Andrew Dunstan
Per discussion on -hackers. We treat Unicode escapes when unescaping them similarly to the way we treat them in PostgreSQL string literals. Escapes in the ASCII range are always accepted, no matter what the database encoding. Escapes for higher code points are only processed in UTF8 databases, and attempts to process them in other databases will result in an error. \u0000 is never unescaped, since it would result in an impermissible null byte.
2013-06-08Handle Unicode surrogate pairs correctly when processing JSON.Andrew Dunstan
In 9.2, Unicode escape sequences are not analysed at all other than to make sure that they are in the form \uXXXX. But in 9.3 many of the new operators and functions try to turn JSON text values into text in the server encoding, and this includes de-escaping Unicode escape sequences. This processing had not taken into account the possibility that this might contain a surrogate pair to designate a character outside the BMP. That is now handled correctly. This also enforces correct use of surrogate pairs, something that is not done by the type's input routines. This fact is noted in the docs.
2013-06-03Additional spelling correctionsStephen Frost
A few more minor spelling corrections, no functional changes. Thom Brown
2013-06-01Minor spelling fixesStephen Frost
Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.
2013-05-31Don't emit non-canonical empty arrays in array_remove().Noah Misch
Dean Rasheed
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-05-16Fix crash when trying to display a NOTIFY rule action.Tom Lane
Fixes oversight in commit 2ffa740be9d96a3743ecb7e42391c53d0760c65a. Per report from Josh Kupershmidt. I think we've broken this case before, so let's add a regression test this time.
2013-05-11Fix to_number() to correctly ignore thousands separator when it's '.'.Tom Lane
The existing code in NUM_numpart_from_char has hard-wired logic to treat '.' as decimal point, even when we're using a locale-aware format string and the locale says that '.' is the thousands separator. This results in clearly wrong answers in FM mode (where we must be able to identify the decimal point location), as per bug report from Patryk Kordylewski. Since the initialization code in NUM_prepare_locale already sets up Np->decimal as either the locale decimal-point string or "." depending on which decimal-point format code was used, there's really no need to have any extra logic at all in NUM_numpart_from_char: we only need to test for a match to Np->decimal. (Note: AFAICS there's nothing in here that explicitly checks for thousands separators --- rather, any unmatched character is silently skipped over. That's pretty bogus IMO but it's not the issue being complained of.) This is a longstanding bug, but it's possible that some existing apps are depending on '.' being recognized as decimal point even when using a D format code. Hence, no back-patch. We should probably list this as a potential incompatibility in the 9.3 release notes.
2013-05-10Guard against input_rows == 0 in estimate_num_groups().Tom Lane
This case doesn't normally happen, because the planner usually clamps all row estimates to at least one row; but I found that it can arise when dealing with relations excluded by constraints. Without a defense, estimate_num_groups() can return zero, which leads to divisions by zero inside the planner as well as assertion failures in the executor. An alternative fix would be to change set_dummy_rel_pathlist() to make the size estimate for a dummy relation 1 row instead of 0, but that seemed pretty ugly; and probably someday we'll want to drop the convention that the minimum rowcount estimate is 1 row. Back-patch to 8.4, as the problem can be demonstrated that far back.
2013-05-06Move materialized views' is-populated status into their pg_class entries.Tom Lane
Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.
2013-05-01Use correct length to convert json unicode escapes.Andrew Dunstan
Bug reported on IRC - fix due to Andrew Gierth.
2013-04-20Clean up references to SQL92Peter Eisentraut
In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.
2013-04-15Correct handling of NULL arguments in json funcs.Andrew Dunstan
Per gripe from Tom Lane.
2013-04-09Create a distinction between a populated matview and a scannable one.Kevin Grittner
The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.
2013-04-09Support indexing of regular-expression searches in contrib/pg_trgm.Tom Lane
This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
2013-04-04Fix off by one error in JSON extract path code.Andrew Dunstan
Bug report by David Wheeler, diagnosis assistance from Tom Lane.
2013-04-03Avoid updating our PgBackendStatus entry when track_activities is off.Tom Lane
The point of turning off track_activities is to avoid this reporting overhead, but a thinko in commit 4f42b546fd87a80be30c53a0f2c897acb826ad52 caused pgstat_report_activity() to perform half of its updates anyway. Fix that, and also make sure that we clear all the now-disabled fields when transitioning to the non-reporting state.
2013-03-29Add new JSON processing functions and parser API.Andrew Dunstan
The JSON parser is converted into a recursive descent parser, and exposed for use by other modules such as extensions. The API provides hooks for all the significant parser event such as the beginning and end of objects and arrays, and providing functions to handle these hooks allows for fairly simple construction of a wide variety of JSON processing functions. A set of new basic processing functions and operators is also added, which use this API, including operations to extract array elements, object fields, get the length of arrays and the set of keys of a field, deconstruct an object into a set of key/value pairs, and create records from JSON objects and arrays of objects. Catalog version bumped. Andrew Dunstan, with some documentation assistance from Merlin Moncure.
2013-03-28Add sql_drop event for event triggersAlvaro Herrera
This event takes place just before ddl_command_end, and is fired if and only if at least one object has been dropped by the command. (For instance, DROP TABLE IF EXISTS of a table that does not in fact exist will not lead to such a trigger firing). Commands that drop multiple objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event to fire. Some firings might be surprising, such as ALTER TABLE DROP COLUMN. The trigger is fired after the drop has taken place, because that has been deemed the safest design, to avoid exposing possibly-inconsistent internal state (system catalogs as well as current transaction) to the user function code. This means that careful tracking of object identification is required during the object removal phase. Like other currently existing events, there is support for tag filtering. To support the new event, add a new pg_event_trigger_dropped_objects() set-returning function, which returns a set of rows comprising the objects affected by the command. This is to be used within the user function code, and is mostly modelled after the recently introduced pg_identify_object() function. Catalog version bumped due to the new function. Dimitri Fontaine and Álvaro Herrera Review by Robert Haas, Tom Lane
2013-03-21Fix "element <@ range" cost estimation.Heikki Linnakangas
The statistics-based cost estimation patch for range types broke that, by incorrectly assuming that the left operand of all range oeprators is a range. That lead to a "type x is not a range type" error. Because it took so long for anyone to notice, add a regression test for that case. We still don't do proper statistics-based cost estimation for that, so you just get a default constant estimate. We should look into implementing that, but this patch at least fixes the regression. Spotted by Tom Lane, when testing query from Josh Berkus.
2013-03-20Allow extracting machine-readable object identityAlvaro Herrera
Introduce pg_identify_object(oid,oid,int4), which is similar in spirit to pg_describe_object but instead produces a row of machine-readable information to uniquely identify the given object, without resorting to OIDs or other internal representation. This is intended to be used in the event trigger implementation, to report objects being operated on; but it has usefulness of its own. Catalog version bumped because of the new function.
2013-03-14Extend format() to handle field width and left/right alignment.Tom Lane
This change adds some more standard sprintf() functionality to format(). Pavel Stehule, reviewed by Dean Rasheed and Kyotaro Horiguchi
2013-03-14Add cost estimation of range @> and <@ operators.Heikki Linnakangas
The estimates are based on the existing lower bound histogram, and a new histogram of range lengths. Bump catversion, because the range length histogram now needs to be present in statistic slot kind 6, or you get an error on @> and <@ queries. (A re-ANALYZE would be enough to fix that, though) Alexander Korotkov, with some refactoring by me.