summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
AgeCommit message (Collapse)Author
2012-11-21Speed up operations on numeric, mostly by avoiding palloc() overhead.Heikki Linnakangas
In many functions, a NumericVar was initialized from an input Numeric, to be passed as input to a calculation function. When the NumericVar is not modified, the digits array of the NumericVar can point directly to the digits array in the original Numeric, and we can avoid a palloc() and memcpy(). Add init_var_from_num() function to initialize a var like that. Remove dscale argument from get_str_from_var(), as all the callers just passed the dscale of the variable. That means that the rounding it used to do was not actually necessary, and get_str_from_var() no longer scribbles on its input. That makes it safer in general, and allows us to use the new init_var_from_num() function in e.g numeric_out(). Also modified numericvar_to_int8() to no scribble on its input either. It creates a temporary copy to avoid that. To compensate, the callers no longer need to create a temporary copy, so the net # of pallocs is the same, but this is nicer. In the passing, use a constant for the number 10 in get_str_from_var_sci(), when calculating 10^exponent. Saves a palloc() and some cycles to convert integer 10 to numeric. Original patch by Kyotaro HORIGUCHI, with further changes by me. Reviewed by Pavel Stehule.
2012-11-19Improve handling of INT_MIN / -1 and related cases.Tom Lane
Some platforms throw an exception for this division, rather than returning a necessarily-overflowed result. Since we were testing for overflow after the fact, an exception isn't nice. We can avoid the problem by treating division by -1 as negation. Add some regression tests so that we'll find out if any compilers try to optimize away the overflow check conditions. This ought to be back-patched, but I'm going to see what the buildfarm reports about the regression tests first. Per discussion with Xi Wang, though this is different from the patch he submitted.
2012-11-14Fix the int8 and int2 cases of (minimum possible integer) % (-1).Tom Lane
The correct answer for this (or any other case with arg2 = -1) is zero, but some machines throw a floating-point exception instead of behaving sanely. Commit f9ac414c35ea084ff70c564ab2c32adb06d5296f dealt with this in int4mod, but overlooked the fact that it also happens in int8mod (at least on my Linux x86_64 machine). Protect int2mod as well; it's not clear whether any machines fail there (mine does not) but since the test is so cheap it seems better safe than sorry. While at it, simplify the original guard in int4mod: we need only check for arg2 == -1, we don't need to check arg1 explicitly. Xi Wang, with some editing by me.
2012-11-13Fix memory leaks in record_out() and record_send().Tom Lane
record_out() leaks memory: it fails to free the strings returned by the per-column output functions, and also is careless about detoasted values. This results in a query-lifespan memory leakage when returning composite values to the client, because printtup() runs the output functions in the query-lifespan memory context. Fix it to handle these issues the same way printtup() does. Also fix a similar leakage in record_send(). (At some point we might want to try to run output functions in shorter-lived memory contexts, so that we don't need a zero-leakage policy for them. But that would be a significantly more invasive patch, which doesn't seem like material for back-patching.) In passing, use appendStringInfoCharMacro instead of appendStringInfoChar in the innermost data-copying loop of record_out, to try to shave a few cycles from this function's runtime. Per trouble report from Carlos Henrique Reimer. Back-patch to all supported versions.
2012-11-07Make the streaming replication protocol messages architecture-independent.Heikki Linnakangas
We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.
2012-10-24Tweak genericcostestimate's fudge factor for index size.Tom Lane
To provide some bias against using a large index when a small one would do as well, genericcostestimate adds a "fudge factor", which for a long time was random_page_cost * index_pages/10000. However, this can grow to be the dominant term in indexscan cost estimates when the index involved is large enough, a behavior that was never intended. Change to a ln(1 + n/10000) formulation, which has nearly the same behavior up to a few hundred pages but tails off significantly thereafter. (A log curve seems correct on first principles, since what we're trying to account for here is index descent costs, which are typically logarithmic.) Per bug #7619 from Niko Kiirala. Possibly this change should get back-patched, but I'm hesitant to mess with cost estimates in stable branches.
2012-10-19Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly.Tom Lane
Per bug #7615 from Marko Tiikkaja. Apparently nobody ever tried this case before ...
2012-10-12Fix oversight in new code for printing rangetable aliases.Tom Lane
In commit 11e131854f8231a21613f834c40fe9d046926387, I missed the case of a CTE RTE that doesn't have a user-defined alias, but does have an alias assigned by set_rtable_names(). Per report from Peter Eisentraut. While at it, refactor slightly to reduce code duplication.
2012-10-07Quiet a few MSC compiler warnings.Andrew Dunstan
2012-10-03Avoid planner crash/Assert failure with joins to unflattened subqueries.Tom Lane
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed at must have been planned already. However, this isn't a safe assumption because we must do selectivity estimation while generating indexscan paths, and that code might look at join clauses involving a rel that the loop in set_base_rel_sizes() hasn't reached yet. The simplest fix is to play dumb in such a situation, that is give up trying to extract any stats for the Var. This could possibly be improved by making a separate pass over the RTE list to plan each unflattened subquery before we start the main planning work --- but that would be pretty invasive and it doesn't seem worth it, for now at least. (We couldn't just break set_base_rel_sizes() into two loops: the prescan would need to handle all subquery rels in the query, not only those in the current join subproblem.) This bug was introduced in commit 1cb108efb0e60d87e4adec38e7636b6e8efbeb57, although I think that subsequent changes may have exposed it more than it was originally. Per bug #7580 from Maxim Boguk.
2012-10-02Fix access past end of string in date parsing.Heikki Linnakangas
This affects date_in(), and a couple of other funcions that use DecodeDate(). Hitoshi Harada
2012-09-27Have pg_terminate/cancel_backend not ERROR on non-existent processesAlvaro Herrera
This worked fine for superusers, but not for ordinary users trying to cancel their own processes. Tweak the order the checks are done in so that we correctly return SIGNAL_BACKEND_ERROR (which current callers know to ignore without erroring out) so that an ordinary user can loop through a resultset without fearing that a process might exit in the middle of said looping -- causing the remaining processes to go unsignalled. Incidentally, the last in-core caller of IsBackendPid() is now gone. However, the function is exported and must remain in place, because there are plenty of callers in external modules. Author: Josh Kupershmidt Reviewed by Noah Misch
2012-09-21Improve ruleutils.c's heuristics for dealing with rangetable aliases.Tom Lane
The previous scheme had bugs in some corner cases involving tables that had been renamed since a view was made. This could result in dumped views that failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd Albin, as well as in some pgsql-hackers discussion back in January. Also, its behavior for printing EXPLAIN plans was sometimes confusing because of willingness to use the same alias for multiple RTEs (it was Ashutosh Bapat's complaint about that aspect that started the January thread). To fix, ensure that each RTE in the query has a unique unqualified alias, by modifying the alias if necessary (we add "_" and digits as needed to create a non-conflicting name). Then we can just print its variables with that alias, avoiding the confusing and bug-prone scheme of sometimes schema-qualifying variable names. In EXPLAIN, it proves to be expedient to take the further step of only assigning such aliases to RTEs that are actually referenced in the query, since the planner has a habit of generating extra RTEs with the same alias in situations such as inheritance-tree expansion. Although this fixes a bug of very long standing, I'm hesitant to back-patch such a noticeable behavioral change. My experiments while creating a regression test convinced me that actually incorrect output (as opposed to confusing output) occurs only in very narrow cases, which is backed up by the lack of previous complaints from the field. So we may be better off living with it in released branches; and in any case it'd be smart to let this ripen awhile in HEAD before we consider back-patching it.
2012-09-18Fix array_typanalyze to work for domains over arrays.Tom Lane
Not sure how we missed this case, but we did. Per bug #7551 from Diego de Lima.
2012-09-06Allow embedded spaces without quoting in unix_socket_directories entries.Tom Lane
This fix removes an unnecessary incompatibility with the old behavior of the unix_socket_directory parameter. Since pathnames with embedded spaces are fairly popular on some platforms, the incompatibility could be significant in practice. We'll still strip unquoted leading/trailing spaces, however. No docs update since the documentation already implied that it worked like this. Per bug #7514 from Murray Cumming.
2012-09-03Fix to_date() and to_timestamp() to allow specification of the day ofBruce Momjian
the week via ISO or Gregorian designations. The fix is to store the day-of-week consistently as 1-7, Sunday = 1. Fixes bug reported by Marc Munro
2012-08-31Make configure probe for mbstowcs_l as well as wcstombs_l.Tom Lane
We previously supposed that any given platform would supply both or neither of these functions, so that one configure test would be sufficient. It now appears that at least on AIX this is not the case ... which is likely an AIX bug, but nonetheless we need to cope with it. So use separate tests. Per bug #6758; thanks to Andrew Hastie for doing the followup testing needed to confirm what was happening. Backpatch to 9.1, where we began using these functions.
2012-08-30Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera
This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
2012-08-30Fix division by zero in the new range type histogram creation.Heikki Linnakangas
Report and analysis by Matthias.
2012-08-30Improve EXPLAIN's ability to cope with LATERAL references in plans.Tom Lane
push_child_plan/pop_child_plan didn't bother to adjust the "ancestors" list of parent plan nodes when descending to a child plan node. I think this was okay when it was written, but it's not okay in the presence of LATERAL references, since a subplan node could easily be returning a LATERAL value back up to the same nestloop node that provides the value. Per changed regression test results, the omission led to failure to interpret Param nodes that have perfectly good interpretations.
2012-08-28Split heapam_xlog.h from heapam.hAlvaro Herrera
The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
2012-08-28remove catcache.h from syscache.hAlvaro Herrera
Instead, place a forward struct declaration for struct catclist in syscache.h. This reduces header proliferation somewhat.
2012-08-27Collect and use histograms of lower and upper bounds for range types.Heikki Linnakangas
This enables selectivity estimation of the <<, >>, &<, &> and && operators, as well as the normal inequality operators: <, <=, >=, >. "range @> element" is also supported, but the range-variant @> and <@ operators are not, because they cannot be sensibly estimated with lower and upper bound histograms alone. We would need to make some assumption about the lengths of the ranges for that. Alexander's patch included a separate histogram of lengths for that, but I left that out of the patch for simplicity. Hopefully that will be added as a followup patch. The fraction of empty ranges is also calculated and used in estimation. Alexander Korotkov, heavily modified by me.
2012-08-25Allow text timezone designations, e.g. "America/Chicago", when using theBruce Momjian
ISO "T" timestamptz format.
2012-08-23Fix cascading privilege revoke to notice when privileges are still held.Tom Lane
If we revoke a grant option from some role X, but X still holds the option via another grant, we should not recursively revoke the privilege from role(s) Y that X had granted it to. This was supposedly fixed as one aspect of commit 4b2dafcc0b1a579ef5daaa2728223006d1ff98e9, but I must not have tested it, because in fact that code never worked: it forgot to shift the grant-option bits back over when masking the bits being revoked. Per bug #6728 from Daniel German. Back-patch to all active branches, since this has been wrong since 8.0.
2012-08-19Allow OLD and NEW in multi-row VALUES within rules.Tom Lane
Now that we have LATERAL, it's fairly painless to allow this case, which was left as a TODO in the original multi-row VALUES implementation.
2012-08-17Check LIBXML_VERSION instead of testing in configure script.Tom Lane
We had put a test for libxml2's xmlStructuredErrorContext variable in configure, but of course that doesn't work on Windows builds. The next best alternative seems to be to test the LIBXML_VERSION symbol provided by xmlversion.h. Per report from Talha Bin Rizwan, though this fixes it in a different way than his proposed patch.
2012-08-16Suppress possibly-uninitialized-variable warning.Tom Lane
2012-08-16Add SP-GiST support for range types.Heikki Linnakangas
The implementation is a quad-tree, largely copied from the quad-tree implementation for points. The lower and upper bound of ranges are the 2d coordinates, with some extra code to handle empty ranges. I left out the support for adjacent operator, -|-, from the original patch. Not because there was necessarily anything wrong with it, but it was more complicated than the other operators, and I only have limited time for reviewing. That will follow as a separate patch. Alexander Korotkov, reviewed by Jeff Davis and me.
2012-08-15On second thought, explain why date_trunc("week") on interval values isBruce Momjian
not supported in the error message, rather than the docs.
2012-08-14Prevent access to external files/URLs via XML entity references.Tom Lane
xml_parse() would attempt to fetch external files or URLs as needed to resolve DTD and entity references in an XML value, thus allowing unprivileged database users to attempt to fetch data with the privileges of the database server. While the external data wouldn't get returned directly to the user, portions of it could be exposed in error messages if the data didn't parse as valid XML; and in any case the mere ability to check existence of a file might be useful to an attacker. The ideal solution to this would still allow fetching of references that are listed in the host system's XML catalogs, so that documents can be validated according to installed DTDs. However, doing that with the available libxml2 APIs appears complex and error-prone, so we're not going to risk it in a security patch that necessarily hasn't gotten wide review. So this patch merely shuts off all access, causing any external fetch to silently expand to an empty string. A future patch may improve this. In HEAD and 9.2, also suppress warnings about undefined entities, which would otherwise occur as a result of not loading referenced DTDs. Previous branches don't show such warnings anyway, due to different error handling arrangements. Credit to Noah Misch for first reporting the problem, and for much work towards a solution, though this simplistic approach was not his preference. Also thanks to Daniel Veillard for consultation. Security: CVE-2012-3489
2012-08-10Support having multiple Unix-domain sockets per postmaster.Tom Lane
Replace unix_socket_directory with unix_socket_directories, which is a list of socket directories, and adjust postmaster's code to allow zero or more Unix-domain sockets to be created. This is mostly a straightforward change, but since the Unix sockets ought to be created after the TCP/IP sockets for safety reasons (better chance of detecting a port number conflict), AddToDataDirLockFile needs to be fixed to support out-of-order updates of data directory lockfile lines. That's a change that had been foreseen to be necessary someday anyway. Honza Horak, reviewed and revised by Tom Lane
2012-08-08Add additional C comments for to_date/to_char() fixes.Bruce Momjian
2012-08-07Implement SQL-standard LATERAL subqueries.Tom Lane
This patch implements the standard syntax of LATERAL attached to a sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM, since set-returning function calls are expected to be one of the principal use-cases. The main change here is a rewrite of the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. The parser "namespace" lists are no longer lists of bare RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE pointer as well as some visibility-controlling flags. Aside from supporting LATERAL correctly, this lets us get rid of the ancient hacks that required rechecking subqueries and JOIN/ON and function-in-FROM expressions for invalid references after they were initially parsed. Invalid column references are now always correctly detected on sight. In passing, remove assorted parser error checks that are now dead code by virtue of our having gotten rid of add_missing_from, as well as some comments that are obsolete for the same reason. (It was mainly add_missing_from that caused so much fudging here in the first place.) The planner support for this feature is very minimal, and will be improved in future patches. It works well enough for testing purposes, though. catversion bump forced due to new field in RangeTblEntry.
2012-08-07Fix to_char(), to_date(), and to_timestamp() to handle negative/BCBruce Momjian
century specifications just like positive/AD centuries. Previously the behavior was either wrong or inconsistent with positive/AD handling. Centuries without years now always assume the first year of the century, which is now documented.
2012-08-07Fix redundant wordingAlvaro Herrera
2012-08-06Make strings identicalAlvaro Herrera
2012-08-03Fix bugs with parsing signed hh:mm and hh:mm:ss fields in interval input.Tom Lane
DecodeInterval() failed to honor the "range" parameter (the special SQL syntax for indicating which fields appear in the literal string) if the time was signed. This seems inappropriate, so make it work like the not-signed case. The inconsistency was introduced in my commit f867339c0148381eb1d01f93ab5c79f9d10211de, which as noted in its log message was only really focused on making SQL-compliant literals work per spec. Including a sign here is not per spec, but if we're going to allow it then it's reasonable to expect it to work like the not-signed case. Also, remove bogus setting of tmask, which caused subsequent processing to think that what had been given was a timezone and not an hh:mm(:ss) field, thus confusing checks for redundant fields. This seems to be an aboriginal mistake in Lockhart's commit 2cf1642461536d0d8f3a1cf124ead0eac04eb760. Add regression test cases to illustrate the changed behaviors. Back-patch as far as 8.4, where support for spec-compliant interval literals was added. Range problem reported and diagnosed by Amit Kapila, tmask problem by me.
2012-07-24Change syntax of new CHECK NO INHERIT constraintsAlvaro Herrera
The initially implemented syntax, "CHECK NO INHERIT (expr)" was not deemed very good, so switch to "CHECK (expr) NO INHERIT" instead. This way it looks similar to SQL-standards compliant constraint attribute. Backport to 9.2 where the new syntax and feature was introduced. Per discussion.
2012-07-18Refactor the way code is shared between some range type functions.Heikki Linnakangas
Functions like range_eq, range_before etc. are exposed at the SQL-level, but they're also used internally by the GiST consistent support function. The code sharing was done by a hack, TrickFunctionCall2, which relied on the knowledge that all the functions used fn_extra the same way. This commit splits the functions into internal versions that take a TypeCacheEntry as argument, and thin wrappers to expose the functions at the SQL-level. The internal versions can then be called directly and in a less hacky way from the GiST consistent function. This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a different version of this code in the 9.2 branch. That would make backpatching fixes in this area more difficult. Alexander Korotkov
2012-07-18Syntax support and documentation for event triggers.Robert Haas
They don't actually do anything yet; that will get fixed in a follow-on commit. But this gets the basic infrastructure in place, including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT, SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER; pg_dump and psql support; and documentation for the anticipated initial feature set. Dimitri Fontaine, with review and a bunch of additional hacking by me. Thom Brown extensively reviewed earlier versions of this patch set, but there's not a whole lot of that code left in this commit, as it turns out.
2012-07-16Remove unreachable codePeter Eisentraut
The Solaris Studio compiler warns about these instances, unlike more mainstream compilers such as gcc. But manual inspection showed that the code is clearly not reachable, and we hope no worthy compiler will complain about removing this code.
2012-07-15Prevent corner-case core dump in rfree().Tom Lane
rfree() failed to cope with the case that pg_regcomp() had initialized the regex_t struct but then failed to allocate any memory for re->re_guts (ie, the first malloc call in pg_regcomp() failed). It would try to touch the guts struct anyway, and thus dump core. This is a sufficiently narrow corner case that it's not surprising it's never been seen in the field; but still a bug is a bug, so patch all active branches. Noted while investigating whether we need to call pg_regfree after a failure return from pg_regcomp. Other than this bug, it turns out we don't, so adjust comments appropriately.
2012-07-12Avoid extra newlines in XML mapping in table forest modePeter Eisentraut
found by P. Broennimann
2012-07-11Add array_remove() and array_replace() functions.Tom Lane
These functions support removing or replacing array element value(s) matching a given search value. Although intended mainly to support a future array-foreign-key feature, they seem useful in their own right. Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
2012-07-10Re-implement extraction of fixed prefixes from regular expressions.Tom Lane
To generate btree-indexable conditions from regex WHERE conditions (such as WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed prefix that a regex might have; that is, find any string that must be a prefix of all strings satisfying the regex. We used to do that with entirely ad-hoc code that looked at the source text of the regex. It didn't know very much about regex syntax, which mostly meant that it would fail to identify some optimizable cases; but Viktor Rosenfeld reported that it would produce actively wrong answers for quantified parenthesized subexpressions, such as '^(foo)?bar'. Rather than trying to extend the ad-hoc code to cover this, let's get rid of it altogether in favor of identifying prefixes by examining the compiled form of a regex. To do this, I've added a new entry point "pg_regprefix" to the regex library; hopefully it is defined in a sufficiently general fashion that it can remain in the library when/if that code gets split out as a standalone project. Since this bug has been there for a very long time, this fix needs to get back-patched. However it depends on some other recent commits (particularly the addition of wchar-to-database-encoding conversion), so I'll commit this separately and then go to work on back-porting the necessary fixes.
2012-07-09Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns.Tom Lane
Previously, pattern_fixed_prefix() was defined to return whatever fixed prefix it could extract from the pattern, plus the "rest" of the pattern. That definition was sensible for LIKE patterns, but not so much for regexes, where reconstituting a valid pattern minus the prefix could be quite tricky (certainly the existing code wasn't doing that correctly). Since the only thing that callers ever did with the "rest" of the pattern was to pass it to like_selectivity() or regex_selectivity(), let's cut out the middle-man and just have pattern_fixed_prefix's subroutines do this directly. Then pattern_fixed_prefix can return a simple selectivity number, and the question of how to cope with partial patterns is removed from its API specification. While at it, adjust the API spec so that callers who don't actually care about the pattern's selectivity (which is a lot of them) can pass NULL for the selectivity pointer to skip doing the work of computing a selectivity estimate. This patch is only an API refactoring that doesn't actually change any processing, other than allowing a little bit of useless work to be skipped. However, it's necessary infrastructure for my upcoming fix to regex prefix extraction, because after that change there won't be any simple way to identify the "rest" of the regex, not even to the low level of fidelity needed by regex_selectivity. We can cope with that if regex_fixed_prefix and regex_selectivity communicate directly, but not if we have to work within the old API. Hence, back-patch to all active branches.
2012-07-08Fix planner to pass correct collation to operator selectivity estimators.Tom Lane
We can do this without creating an API break for estimation functions by passing the collation using the existing fmgr functionality for passing an input collation as a hidden parameter. The need for this was foreseen at the outset, but we didn't get around to making it happen in 9.1 because of the decision to sort all pg_statistic histograms according to the database's default collation. That meant that selectivity estimators generally need to use the default collation too, even if they're estimating for an operator that will do something different. The reason it's suddenly become more interesting is that regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE property), and we no longer want to use the wrong collation when examining regexps during planning. It's not that the selectivity estimate is likely to change much from this; rather that we are thinking of caching compiled regexps during planner estimation, and we won't get the intended benefit if we cache them with a different collation than the executor will use. Back-patch to 9.1, both because the regexp change is likely to get back-patched and because we might as well get this right in all collation-supporting branches, in case any third-party code wants to rely on getting the collation. The patch turns out to be minuscule now that I've done it ...
2012-07-02Assorted message style improvementsPeter Eisentraut
2012-07-02Fix to_date's handling of year 519.Tom Lane
A thinko in commit 029dfdf1157b6d837a7b7211cd35b00c6bcd767c caused the year 519 to be handled differently from either adjacent year, which was not the intention AFAICS. Report and diagnosis by Marc Cousin. In passing, remove redundant re-tests of year value.