user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2011-08-21	Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.	Tom Lane
	Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-02	Avoid integer overflow when LIMIT + OFFSET >= 2^63.	Heikki Linnakangas
	This fixes bug #6139 reported by Hitoshi Harada.
2011-07-04	Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h	Alvaro Herrera
	This lets us stop including rel.h into execnodes.h, which is a widely used header.
2011-07-03	Fix bugs in relpersistence handling during table creation.	Robert Haas
	Unlike the relistemp field which it replaced, relpersistence must be set correctly quite early during the table creation process, as we rely on it quite early on for a number of purposes, including security checks. Normally, this is set based on whether the user enters CREATE TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a relation may also be made implicitly temporary by creating it in pg_temp. This patch fixes the handling of that case, and also disables creation of unlogged tables in temporary tablespace (such table indeed skip WAL-logging, but we reject an explicit specification) and creation of relations in the temporary schemas of other sessions (which is not very sensible, and didn't work right anyway). Report by Amit Khandekar.
2011-06-29	Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's	Heikki Linnakangas
	more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me
2011-06-29	Grab predicate locks on matching tuples in a lossy bitmap heap scan.	Heikki Linnakangas
	Non-lossy case was already handled correctly. Kevin Grittner
2011-06-27	Avoid having two copies of the HOT-chain search logic.	Robert Haas
	It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.
2011-06-20	Remove extra copying of TupleDescs for heap_create_with_catalog	Alvaro Herrera
	Some callers were creating copies of tuple descriptors to pass to that function, stating in code comments that it was necessary because it modified the passed descriptor. Code inspection reveals this not to be true, and indeed not all callers are passing copies in the first place. So remove the extra ones and the misleading comments about this behavior as well.
2011-06-16	Remove another no-longer-needed inclusion of predicate.h.	Tom Lane

2011-06-15	Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC	Heikki Linnakangas
	snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
2011-06-09	Pgindent run before 9.1 beta2.	Bruce Momjian

2011-06-02	Disallow SELECT FOR UPDATE/SHARE on sequences.	Tom Lane
	We can't allow this because such an operation stores its transaction XID into the sequence tuple's xmax. Because VACUUM doesn't process sequences (and we don't want it to start doing so), such an xmax value won't get frozen, meaning it will eventually refer to nonexistent pg_clog storage, and even wrap around completely. Since the row lock is ignored by nextval and setval, the usefulness of the operation is highly debatable anyway. Per reports of trouble with pgpool 3.0, which had ill-advisedly started using such commands as a form of locking. In HEAD, also disallow SELECT FOR UPDATE/SHARE on toast tables. Although this does work safely given the current implementation, there seems no good reason to allow it. I refrained from changing that behavior in back branches, however.
2011-06-01	Allow hash joins to be interrupted while searching hash table for match.	Tom Lane
	Per experimentation with a recent example, in which unreasonable amounts of time could elapse before the backend would respond to a query-cancel. This might be something to back-patch, but the patch doesn't apply cleanly because this code was rewritten for 9.1. Given the lack of field complaints I won't bother for now. Cédric Villemain
2011-05-23	Install defenses against overflow in BuildTupleHashTable().	Tom Lane
	The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.
2011-05-21	Reset per-tuple memory context between every row in a scan node, even when	Heikki Linnakangas
	there's no quals or projections. Currently this only matters for foreign scans, as none of the other scan nodes litter the per-tuple memory context when there's no quals or projections.
2011-04-25	Refactor broken CREATE TABLE IF NOT EXISTS support.	Robert Haas
	Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.
2011-04-22	Make a code-cleanup pass over the collations patch.	Tom Lane
	This patch is almost entirely cosmetic --- mostly cleaning up a lot of neglected comments, and fixing code layout problems in places where the patch made lines too long and then pgindent did weird things with that. I did find a bug-of-omission in equalTupleDescs().
2011-04-12	Pass collations to functions in FunctionCallInfoData, not FmgrInfo.	Tom Lane
	Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
2011-04-11	Clean up most -Wunused-but-set-variable warnings from gcc 4.6	Peter Eisentraut
	This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
2011-04-10	pgindent run before PG 9.1 beta 1.	Bruce Momjian

2011-03-27	Fix check_exclusion_constraint() to insert correct collations in ScanKeys.	Tom Lane

2011-03-26	Clean up cruft around collation initialization for tupdescs and scankeys.	Tom Lane
	I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.
2011-03-25	Pass collation to makeConst() instead of looking it up internally.	Tom Lane
	In nearly all cases, the caller already knows the correct collation, and in a number of places, the value the caller has handy is more correct than the default for the type would be. (In particular, this patch makes it significantly less likely that eval_const_expressions will result in changing the exposed collation of an expression.) So an internal lookup is both expensive and wrong.
2011-03-24	Fix handling of collation in SQL-language functions.	Tom Lane
	Ensure that parameter symbols receive collation from the function's resolved input collation, and fix inlining to behave properly. BTW, this commit lays about 90% of the infrastructure needed to support use of argument names in SQL functions. Parsing of parameters is now done via the parser-hook infrastructure ... we'd just need to supply a column-ref hook ...
2011-03-19	Revise collation derivation method and expression-tree representation.	Tom Lane
	All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
2011-03-11	Split CollateClause into separate raw and analyzed node types.	Tom Lane
	CollateClause is now used only in raw grammar output, and CollateExpr after parse analysis. This is for clarity and to avoid carrying collation names in post-analysis parse trees: that's both wasteful and possibly misleading, since the collation's name could be changed while the parsetree still exists. Also, clean up assorted infelicities and omissions in processing of the node type.
2011-02-28	Rearrange snapshot handling to make rule expansion more consistent.	Tom Lane
	With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. The patch removes pg_parse_and_rewrite(), which is no longer used anywhere. There was considerable discussion about more aggressive refactoring of the query-processing functions exported by postgres.c, but for the moment nothing more has been done there. I also took the opportunity to refactor snapmgr.c's API slightly: the former PushUpdatedSnapshot() has been split into two functions. Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
2011-02-27	Refactor the executor's API to support data-modifying CTEs better.	Tom Lane
	The originally committed patch for modifying CTEs didn't interact well with EXPLAIN, as noted by myself, and also had corner-case problems with triggers, as noted by Dean Rasheed. Those problems show it is really not practical for ExecutorEnd to call any user-defined code; so split the cleanup duties out into a new function ExecutorFinish, which must be called between the last ExecutorRun call and ExecutorEnd. Some Asserts have been added to these functions to help verify correct usage. It is no longer necessary for callers of the executor to call AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now done by ExecutorStart/ExecutorFinish respectively. If you really need to suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to ExecutorStart. Also, refactor portal commit processing to allow for the possibility that PortalDrop will invoke user-defined code. I think this is not actually necessary just yet, since the portal-execution-strategy logic forces any non-pure-SELECT query to be run to completion before we will consider committing. But it seems like good future-proofing.
2011-02-25	Fix order of shutdown processing when CTEs contain inter-references.	Tom Lane
	We need ExecutorEnd to run the ModifyTable nodes to completion in reverse order of initialization, not forward order. Easily done by constructing the list back-to-front.
2011-02-25	Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.	Tom Lane
	This patch implements data-modifying WITH queries according to the semantics that the updates all happen with the same command counter value, and in an unspecified order. Therefore one WITH clause can't see the effects of another, nor can the outer query see the effects other than through the RETURNING values. And attempts to do conflicting updates will have unpredictable results. We'll need to document all that. This commit just fixes the code; documentation updates are waiting on author. Marko Tiikkaja and Hitoshi Harada
2011-02-21	Remove ExecRemoveJunk(), which is no longer used anywhere.	Tom Lane
	This was a leftover from the pre-8.1 design of junkfilters. It doesn't seem to have any reason to live, since it's merely a combination of two easy function calls, and not a well-designed combination at that (it encourages callers to leak the result tuple).
2011-02-21	Fix dangling-pointer problem in before-row update trigger processing.	Tom Lane
	ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new tuple value by seeing if the returned tuple was pointer-equal to the old one. But the "old one" was in estate->es_junkFilter's result slot, which would be scribbled on if we had done an EvalPlanQual update in response to a concurrent update of the target tuple; therefore we were comparing a dangling pointer to a live one. Given the right set of circumstances we could get a false match, resulting in not forcing the tuple to be stored in the slot we thought it was stored in. In the case reported by Maxim Boguk in bug #5798, this led to "cannot extract system attribute from virtual tuple" failures when trying to do "RETURNING ctid". I believe there is a very-low-probability chance of more serious errors, such as generating incorrect index entries based on the original rather than the trigger-modified version of the row. In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers, ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to have similar APIs. In the back branches I just changed ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers case.
2011-02-20	Implement an API to let foreign-data wrappers actually be functional.	Tom Lane
	This commit provides the core code and documentation needed. A contrib module test case will follow shortly. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-09	Fix improper matching of resjunk column names for FOR UPDATE in subselect.	Tom Lane
	Flattening of subquery range tables during setrefs.c could lead to the rangetable indexes in PlanRowMark nodes not matching up with the column names previously assigned to the corresponding resjunk ctid (resp. tableoid or wholerow) columns. Typical symptom would be either a "cannot extract system attribute from virtual tuple" error or an Assert failure. This wasn't a problem before 9.0 because we didn't support FOR UPDATE below the top query level, and so the final flattening could never renumber an RTE that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique number for each PlanRowMark to label the associated resjunk columns, so that the number need not change during flattening. Per report from David Johnston (though I'm darned if I can see how this got past initial testing of the relevant code). Back-patch to 9.0.
2011-02-08	Per-column collation support	Peter Eisentraut
	This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08	Implement genuine serializable isolation level.	Heikki Linnakangas
	Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-01	Fix wrong error reports in 'number of array dimensions exceeds the	Itagaki Takahiro
	maximum allowed' messages, that have reported one-less dimensions. Alexey Klyukin
2011-01-12	Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.	Tom Lane
	In an inherited UPDATE/DELETE, each target table has its own subplan, because it might have a column set different from other targets. This means that the resjunk columns we add to support EvalPlanQual might be at different physical column numbers in each subplan. The EvalPlanQual rewrite I did for 9.0 failed to account for this, resulting in possible misbehavior or even crashes during concurrent updates to the same row, as seen in a recent report from Gordon Shannon. Revise the data structure so that we track resjunk column numbers separately for each subplan. I also chose to move responsibility for identifying the physical column numbers back to executor startup, instead of assuming that numbers derived during preprocess_targetlist would stay valid throughout subsequent massaging of the plan. That's a bit slower, so we might want to consider undoing it someday; but it would complicate the patch considerably and didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
2011-01-01	Basic foreign table support.	Robert Haas
	Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas
2011-01-01	Stamp copyrights for year 2011.	Bruce Momjian

2010-12-30	Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.	Tom Lane
	There's no reason for these values to be known anywhere else. After doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30	Support RIGHT and FULL OUTER JOIN in hash joins.	Tom Lane
	This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.
2010-12-21	Fix typos.	Robert Haas
	Andreas Karlsson
2010-12-13	Generalize concept of temporary relations to "relation persistence".	Robert Haas
	This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
2010-12-02	Create core infrastructure for KNNGIST.	Tom Lane
	This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane
2010-12-01	Prevent inlining a SQL function with multiple OUT parameters.	Tom Lane
	There were corner cases in which the planner would attempt to inline such a function, which would result in a failure at runtime due to loss of information about exactly what the result record type is. Fix by disabling inlining when the function's recorded result type is RECORD. There might be some sub-cases where inlining could still be allowed, but this is a simple and backpatchable fix, so leave refinements for another day. Per bug #5777 from Nate Carson. Back-patch to all supported branches. 8.1 happens to avoid a core-dump here, but it still does the wrong thing.
2010-11-18	Dept of second thoughts: don't try to push LIMIT below a SRF.	Tom Lane
	If we have Limit->Result->Sort, the Result might be projecting a tlist that contains a set-returning function. If so, it's possible for the SRF to sometimes return zero rows, which means we could need to fetch more than N rows from the Sort in order to satisfy LIMIT N. So top-N sorting cannot be used in this scenario.
2010-11-18	Further fallout from the MergeAppend patch.	Tom Lane
	Fix things so that top-N sorting can be used in child Sort nodes of a MergeAppend node, when there is a LIMIT and no intervening joins or grouping. Actually doing this on the executor side isn't too bad, but it's a bit messier to get the planner to cost it properly. Per gripe from Robert Haas. In passing, fix an oversight in the original top-N-sorting patch: query_planner should not assume that a LIMIT can be used to make an explicit sort cheaper when there will be grouping or aggregation in between. Possibly this should be back-patched, but I'm not sure the mistake is serious enough to be a real problem in practice.
2010-11-01	Avoid using a local FunctionCallInfoData struct in ExecMakeFunctionResult	Tom Lane
	and related routines. We already had a redundant FunctionCallInfoData struct in FuncExprState, but were using that copy only in set-returning-function cases, to avoid keeping function evaluation state in the expression tree for the benefit of plpgsql's "simple expression" logic. But of course that didn't work anyway. Given the recent fixes in plpgsql there is no need to have two separate behaviors here. Getting rid of the local FunctionCallInfoData structs should make things a little faster (because we don't need to do InitFunctionCallInfoData each time), and it also makes for a noticeable reduction in stack space consumption during recursive calls.
2010-10-14	Support MergeAppend plans, to allow sorted output from append relations.	Tom Lane
	This patch eliminates the former need to sort the output of an Append scan when an ordered scan of an inheritance tree is wanted. This should be particularly useful for fast-start cases such as queries with LIMIT. Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig, Robert Haas, and Tom Lane.