user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2012-05-09	Improve control logic for bgwriter hibernation mode.	Tom Lane
	Commit 6d90eaaa89a007e0d365f49d6436f35d2392cfeb added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.
2012-05-09	Rename BgWriterShmem/Request to CheckpointerShmem/Request	Simon Riggs

2012-05-09	Rename BgWriterCommLock to CheckpointerCommLock	Simon Riggs

2012-05-08	Fix an issue in recent walwriter hibernation patch.	Tom Lane
	Users of asynchronous-commit mode expect there to be a guaranteed maximum delay before an async commit's WAL records get flushed to disk. The original version of the walwriter hibernation patch broke that. Add an extra shared-memory flag to allow async commits to kick the walwriter out of hibernation mode, without adding any noticeable overhead in cases where no action is needed.
2012-05-08	Reduce idle power consumption of walwriter and checkpointer processes.	Tom Lane
	This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom
2012-05-07	Remove strdup, strtol, strtoul from libpgport	Peter Eisentraut
	These should not be needed anymore, at least after the recent port removals. So let's see whether we can do without them.
2012-05-04	Overdue code review for transaction-level advisory locks patch.	Tom Lane
	Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.
2012-05-03	Remove BSD/OS (BSDi) port. There are no known users upgrading to	Bruce Momjian
	Postgres 9.2, and perhaps no existing users either.
2012-05-02	Add missing parenthesis in comment.	Robert Haas

2012-05-02	Avoid repeated CLOG access from heap_hot_search_buffer.	Robert Haas
	At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.
2012-05-02	More duplicate word removal.	Robert Haas

2012-05-02	Remove duplicate words in comments.	Heikki Linnakangas
	Found these with grep -r "for for ".
2012-05-01	Remove dead ports	Peter Eisentraut
	Remove the following ports: - dgux - nextstep - sunos4 - svr4 - ultrix4 - univel These are obsolete and not worth rescuing. In most cases, there is circumstantial evidence that they wouldn't work anymore anyway.
2012-04-30	Converge all SQL-level statistics timing values to float8 milliseconds.	Tom Lane
	This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.
2012-04-30	Mark ReThrowError() with attribute noreturn	Peter Eisentraut
	All related functions were already so marked.
2012-04-29	Rename I/O timing statistics columns to blk_read_time and blk_write_time.	Tom Lane
	This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.
2012-04-29	Rename track_iotiming GUC to track_io_timing.	Tom Lane
	This spelling seems significantly more readable to me.
2012-04-29	Change return type of ExceptionalCondition to void and mark it noreturn	Peter Eisentraut
	In ancient times, it was thought that this wouldn't work because of TrapMacro/AssertMacro, but changing those to use a comma operator appears to work without compiler warnings.
2012-04-26	Prevent index-only scans from returning wrong answers under Hot Standby.	Robert Haas
	The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.
2012-04-25	Fix planner's handling of RETURNING lists in writable CTEs.	Tom Lane
	setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists, which meant they were left with the wrong varnos when the RETURNING list was in a subquery. That was never possible before writable CTEs, of course, but now it's broken. The executor fails to notice any problem because ExecEvalVar just references the ecxt_scantuple for any normal varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a recent complaint from Bartosz Dmytrak. Since the eventual rtoffset of the subquery is not known at the time we are preparing its plan node, the previous scheme of executing set_returning_clause_references() at that time cannot handle this adjustment. Fortunately, it turns out that we don't really need to do it that way, because all the needed information is available during normal setrefs.c execution; we just have to dig it out of the ModifyTable node. So, do that, and get rid of the kluge of early setrefs processing of RETURNING lists. (This is a little bit of a cheat in the case of inherited UPDATE/DELETE, because we are not passing a "root" struct that corresponds exactly to what the subplan was built with. But that doesn't matter, and anyway this is less ugly than early setrefs processing was.) Back-patch to 9.1, where the problem became possible to hit.
2012-04-25	Remove prototype for nonexistent function.	Robert Haas

2012-04-23	Lots of doc corrections.	Robert Haas
	Josh Kupershmidt
2012-04-20	Recast "ONLY" column CHECK constraints as NO INHERIT	Alvaro Herrera
	The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in 61d81bd28dbec65a6b144e0cd3d0bfe25913c3ac, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me
2012-04-19	Revise parameterized-path mechanism to fix assorted issues.	Tom Lane
	This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce all join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.
2012-04-18	Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks.	Robert Haas
	Commit 8e5ac74c1249820ca55481223a95b9124b4a4f95 tried to do this renaming, but I relied on gcc to tell me where I needed to make changes, instead of grep. Noted by Jeff Davis.
2012-04-18	Tighten up error recovery for fast-path locking.	Robert Haas
	The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.
2012-04-14	pg_size_pretty(numeric)	Robert Haas
	The output of the new pg_xlog_location_diff function is of type numeric, since it could theoretically overflow an int8 due to signedness; this provides a convenient way to format such values. Fujii Masao, with some beautification by me.
2012-04-13	Rename bytea_agg to string_agg and add delimiter argument	Peter Eisentraut
	Per mailing list discussion, we would like to keep the bytea functions parallel to the text functions, so rename bytea_agg to string_agg, which already exists for text. Also, to satisfy the rule that we don't want aggregate functions of the same name with a different number of arguments, add a delimiter argument, just like string_agg for text already has.
2012-04-08	Do stack-depth checking in all postmaster children.	Heikki Linnakangas
	We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.
2012-04-06	Dept of second thoughts: improve the API for AnalyzeForeignTable.	Tom Lane
	If we make the initially-called function return the table physical-size estimate, acquire_inherited_sample_rows will be able to use that to allocate numbers of samples among child tables, when the day comes that we want to support foreign tables in inheritance trees.
2012-04-06	Allow statistics to be collected for foreign tables.	Tom Lane
	ANALYZE now accepts foreign tables and allows the table's FDW to control how the sample rows are collected. (But only manual ANALYZEs will touch foreign tables, for the moment, since among other things it's not very clear how to handle remote permissions checks in an auto-analyze.) contrib/file_fdw is extended to support this. Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.
2012-04-06	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	Simon Riggs

2012-04-05	checkopint -> checkpoint	Robert Haas
	Report by Guillaume Lelarge.
2012-04-05	Publish checkpoint timing information to pg_stat_bgwriter.	Robert Haas
	Greg Smith, Peter Geoghegan, and Robert Haas
2012-04-05	Expose track_iotiming data via the statistics collector.	Robert Haas
	Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.
2012-04-03	Add support for renaming domain constraints	Peter Eisentraut

2012-03-31	Add PGDLLIMPORT to ScanKeywords and NumScanKeywords.	Tom Lane
	Per buildfarm, this is now needed by contrib/pg_stat_statements.
2012-03-29	Inherit max_safe_fds to child processes in EXEC_BACKEND mode.	Heikki Linnakangas
	Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.
2012-03-28	Remove now redundant pgpipe code.	Andrew Dunstan

2012-03-27	Add some infrastructure for contrib/pg_stat_statements.	Tom Lane
	Add a queryId field to Query and PlannedStmt. This is not used by the core backend, except for being copied around at appropriate times. It's meant to allow plug-ins to track a particular query forward from parse analysis to execution. The queryId is intentionally not dumped into stored rules (and hence this commit doesn't bump catversion). You could argue that choice either way, but it seems better that stored rule strings not have any dependency on plug-ins that might or might not be present. Also, add a post_parse_analyze_hook that gets invoked at the end of parse analysis (but only for top-level analysis of complete queries, not cases such as analyzing a domain's default-value expression). This is mainly meant to be used to compute and assign a queryId, but it could have other applications. Peter Geoghegan
2012-03-27	New GUC, track_iotiming, to track I/O timings.	Robert Haas
	Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.
2012-03-26	Code cleanup for heap_freeze_tuple.	Robert Haas
	It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.
2012-03-25	Replace empty locale name with implied value in CREATE DATABASE and initdb.	Tom Lane
	setlocale() accepts locale name "" as meaning "the locale specified by the process's environment variables". Historically we've accepted that for Postgres' locale settings, too. However, it's fairly unsafe to store an empty string in a new database's pg_database.datcollate or datctype fields, because then the interpretation could vary across postmaster restarts, possibly resulting in index corruption and other unpleasantness. Instead, we should expand "" to whatever it means at the moment of calling CREATE DATABASE, which we can do by saving the value returned by setlocale(). For consistency, make initdb set up the initial lc_xxx parameter values the same way. initdb was already doing the right thing for empty locale names, but it did not replace non-empty names with setlocale results. On a platform where setlocale chooses to canonicalize the spellings of locale names, this would result in annoying inconsistency. (It seems that popular implementations of setlocale don't do such canonicalization, which is a pity, but the POSIX spec certainly allows it to be done.) The same risk of inconsistency leads me to not venture back-patching this, although it could certainly be seen as a longstanding bug. Per report from Jeff Davis, though this is not his proposed patch.
2012-03-24	Fix planner's handling of outer PlaceHolderVars within subqueries.	Tom Lane
	For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.
2012-03-23	Code review for protransform patches.	Tom Lane
	Fix loss of previous expression-simplification work when a transform function fires: we must not simply revert to untransformed input tree. Instead build a dummy FuncExpr node to pass to the transform function. This has the additional advantage of providing a simpler, more uniform API for transform functions. Move documentation to a somewhat less buried spot, relocate some poorly-placed code, be more wary of null constants and invalid typmod values, add an opr_sanity check on protransform function signatures, and some other minor cosmetic adjustments. Note: although this patch touches pg_proc.h, no need for catversion bump, because the changes are cosmetic and don't actually change the intended catalog contents.
2012-03-21	Clean up compiler warnings from unused variables with asserts disabled	Peter Eisentraut
	For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.
2012-03-19	Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.	Tom Lane
	Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane
2012-03-16	Revisit handling of UNION ALL subqueries with non-Var output columns.	Tom Lane
	In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.
2012-03-16	Add comments explaining why our Itanium spinlock implementation is safe.	Heikki Linnakangas

2012-03-15	Add const qualifier to tzn returned by timestamp2tm()	Peter Eisentraut
	The tzn value might come from tm->tm_zone, which libc declares as const, so it's prudent that the upper layers know about this as well.