user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2011-06-10	Use "transient" files for blind writes, take 2	Alvaro Herrera
	"Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.) This commit fixes a bug in the previous one, which neglected to cleanly handle the LRU ring that fd.c uses to manage open files, and caused an unacceptable failure just before beta2 and was thus reverted.
2011-06-10	Small comment fixes and enhancements.	Heikki Linnakangas

2011-06-09	Tag 9.1beta2.REL9_1_BETA2	Tom Lane

2011-06-09	Revert "Use "transient" files for blind writes"	Alvaro Herrera
	This reverts commit 54d9e8c6c19cbefa8fb42ed3442a0a5327590ed3, which caused a failure on the buildfarm. Not a good thing to have just before a beta release.
2011-06-09	Use "transient" files for blind writes	Alvaro Herrera
	"Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.)
2011-06-09	Pgindent run before 9.1 beta2.	Bruce Momjian

2011-06-08	Make DDL operations play nicely with Serializable Snapshot Isolation.	Heikki Linnakangas
	Truncating or dropping a table is treated like deletion of all tuples, and check for conflicts accordingly. If a table is clustered or rewritten by ALTER TABLE, all predicate locks on the heap are promoted to relation-level locks, because the tuple or page ids of any existing tuples will change and won't be valid after rewriting the table. Arguably ALTER TABLE should be treated like a mass-UPDATE of every row, but if you e.g change the datatype of a column, you could also argue that it's just a change to the physical layout, not a logical change. Reindexing promotes all locks on the index to relation-level lock on the heap. Kevin Grittner, with a lot of cosmetic changes by me.
2011-06-03	Fix failure to check whether a rowtype's component types are sortable.	Tom Lane
	The existence of a btree opclass accepting composite types caused us to assume that every composite type is sortable. This isn't true of course; we need to check if the column types are all sortable. There was logic for this for the case of array comparison (ie, check that the element type is sortable), but we missed the point for rowtypes. Per Teodor's report of an ANALYZE failure for an unsortable composite type. Rather than just add some more ad-hoc logic for this, I moved knowledge of the issue into typcache.c. The typcache will now only report out array_eq, record_cmp, and friends as usable operators if the array or composite type will work with those functions. Unfortunately we don't have enough info to do this for anonymous RECORD types; in that case, just assume it will work, and take the runtime failure as before if it doesn't. This patch might be a candidate for back-patching at some point, but given the lack of complaints from the field, I'd rather just test it in HEAD for now. Note: most of the places touched in this patch will need further work when we get around to supporting hashing of record types.
2011-06-03	SSI comment fixes and enhancements. Notably, document that the conflict-out	Heikki Linnakangas
	flag actually means that the transaction has a conflict out to a transaction that committed before the flagged transaction. Kevin Grittner
2011-06-02	Looks like we can't declare getpeereid on Windows anyway.	Tom Lane
	... for lack of the uid_t and gid_t typedefs. Per buildfarm.
2011-06-02	Implement getpeereid() as a src/port compatibility function.	Tom Lane
	This unifies a bunch of ugly #ifdef's in one place. Per discussion, we only need this where HAVE_UNIX_SOCKETS, so no need to cover Windows. Marko Kreen, some adjustment by Tom Lane
2011-05-31	Replace use of credential control messages with getsockopt(LOCAL_PEERCRED).	Tom Lane
	It turns out the reason we hadn't found out about the portability issues with our credential-control-message code is that almost no modern platforms use that code at all; the ones that used to need it now offer getpeereid(), which we choose first. The last holdout was NetBSD, and they added getpeereid() as of 5.0. So far as I can tell, the only live platform on which that code was being exercised was Debian/kFreeBSD, ie, FreeBSD kernel with Linux userland --- since glibc doesn't provide getpeereid(), we fell back to the control message code. However, the FreeBSD kernel provides a LOCAL_PEERCRED socket parameter that's functionally equivalent to Linux's SO_PEERCRED. That is both much simpler to use than control messages, and superior because it doesn't require receiving a message from the other end at just the right time. Therefore, add code to use LOCAL_PEERCRED when necessary, and rip out all the credential-control-message code in the backend. (libpq still has such code so that it can still talk to pre-9.1 servers ... but eventually we can get rid of it there too.) Clean up related autoconf probes, too. This means that libpq's requirepeer parameter now works on exactly the same platforms where the backend supports peer authentication, so adjust the documentation accordingly.
2011-05-30	Fix VACUUM so that it always updates pg_class.reltuples/relpages.	Tom Lane
	When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30	The row-version chaining in Serializable Snapshot Isolation was still wrong.	Heikki Linnakangas
	On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.
2011-05-23	Improve hash_array() logic for combining hash values.	Robert Haas
	The new logic is less vulnerable to transpositions. This invalidates the contents of hash indexes built with the old functions; hence, bump catversion. Dean Rasheed
2011-05-23	Install defenses against overflow in BuildTupleHashTable().	Tom Lane
	The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.
2011-05-21	Pull up isReset flag from AllocSetContext to MemoryContext struct. This	Heikki Linnakangas
	avoids the overhead of one function call when calling MemoryContextReset(), and it seems like the isReset optimization would be applicable to any new memory context we might invent in the future anyway. This buys back the overhead I just added in previous patch to always call MemoryContextReset() in ExecScan, even when there's no quals or projections.
2011-05-13	More cleanup of FOREIGN TABLE permissions handling.	Robert Haas
	This commit fixes psql, pg_dump, and the information schema to be consistent with the backend changes which I made as part of commit be90032e0d1cf473bdd99aee94218218f59f29f1, and also includes a related documentation tweak. Shigeru Hanada, with slight adjustment.
2011-05-11	Split PGC_S_DEFAULT into two values, for true boot_val vs computed default.	Tom Lane
	Failure to distinguish these cases is the real cause behind the recent reports of Windows builds crashing on 'infinity'::timestamp, which was directly due to failure to establish a value of timezone_abbreviations in postmaster child processes. The postmaster had the desired value, but write_one_nondefault_variable() didn't transmit it to backends. To fix that, invent a new value PGC_S_DYNAMIC_DEFAULT, and be sure to use that or PGC_S_ENV_VAR (as appropriate) for "default" settings that are computed during initialization. (We need both because there's at least one variable that could receive a value from either source.) This commit also fixes ProcessConfigFile's failure to restore the correct default value for certain GUC variables if they are set in postgresql.conf and then removed/commented out of the file. We have to recompute and reinstall the value for any GUC variable that could have received a value from PGC_S_DYNAMIC_DEFAULT or PGC_S_ENV_VAR sources, and there were a number of oversights. (That whole thing is a crock that needs to be redesigned, but not today.) However, I intentionally didn't make it work "exactly right" for the cases of timezone and log_timezone. The exactly right behavior would involve running select_default_timezone, which we'd have to do independently in each postgres process, causing the whole database to become entirely unresponsive for as much as several seconds. That didn't seem like a good idea, especially since the variable's removal from postgresql.conf might be just an accidental edit. Instead the behavior is to adopt the previously active setting as if it were default. Note that this patch creates an ABI break for extensions that use any of the PGC_S_XXX constants; they'll need to be recompiled.
2011-04-28	Use a macro variable PG_PRINTF_ATTRIBUTE for the style used for checking ↵	Andrew Dunstan
	printf type functions. The style is set to "printf" for backwards compatibility everywhere except on Windows, where it is set to "gnu_printf", which eliminates hundreds of false error messages from modern versions of gcc arising from %m and %ll{d,u} formats.
2011-04-27	Tag 9.1beta1.REL9_1_BETA1	Tom Lane

2011-04-27	Revert "Force use of "%I64d" format for 64 bit ints on MinGW."	Andrew Dunstan
	This reverts commit 52d01c2f52c462d29ae0fdfa44c3cae129148a6d. the UINT64_FORMAT bit broke the b uildfarm, so I'm reverting the whole thing pending further investigation.
2011-04-27	Force use of "%I64d" format for 64 bit ints on MinGW.	Andrew Dunstan
	Both this and "%lld" work, but the compiler's format checking doesn't like "%lld", so we get all sorts of spurious warnings.
2011-04-25	Refactor broken CREATE TABLE IF NOT EXISTS support.	Robert Haas
	Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.
2011-04-25	Remove partial and undocumented GRANT .. FOREIGN TABLE support.	Robert Haas
	Instead, foreign tables are treated just like views: permissions can be granted using GRANT privilege ON [TABLE] foreign_table_name TO role, and revoked similarly. GRANT/REVOKE .. FOREIGN TABLE is no longer supported, just as we don't support GRANT/REVOKE .. VIEW. The set of accepted permissions for foreign tables is now identical to the set for regular tables, and views. Per report from Thom Brown, and subsequent discussion.
2011-04-25	Assorted minor changes to silence Windows compiler warnings.	Andrew Dunstan
	Mostly to do with macro redefinitions or object signedness.
2011-04-25	Add postmaster/postgres undocumented -b option for binary upgrades.	Bruce Momjian
	This option turns off autovacuum, prevents non-super-user connections, and enables oid setting hooks in the backend. The code continues to use the old autoavacuum disable settings for servers with earlier catalog versions. This includes a catalog version bump to identify servers that support the -b option.
2011-04-24	Improve cost estimation for aggregates and window functions.	Tom Lane
	The previous coding failed to account properly for the costs of evaluating the input expressions of aggregates and window functions, as seen in a recent gripe from Claudio Freire. (I said at the time that it wasn't counting these costs at all; but on closer inspection, it was effectively charging these costs once per output tuple. That is completely wrong for aggregates, and not exactly right for window functions either.) There was also a hard-wired assumption that aggregates and window functions had procost 1.0, which is now fixed to respect the actual cataloged costs. The costing of WindowAgg is still pretty bogus, since it doesn't try to estimate the effects of spilling data to disk, but that seems like a separate issue.
2011-04-23	Fix char2wchar/wchar2char to support collations properly.	Tom Lane
	These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.
2011-04-22	Make GIN and GIST pass the index collation to all their support functions.	Tom Lane
	Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
2011-04-20	Allow ALTER TABLE name {OF type \| NOT OF}.	Robert Haas
	This syntax allows a standalone table to be made into a typed table, or a typed table to be made standalone. This is possibly a mildly useful feature in its own right, but the real motivation for this change is that we need it to make pg_upgrade work with typed tables. This doesn't actually fix that problem, but it's necessary infrastructure. Noah Misch
2011-04-19	Avoid changing an index's indcheckxmin horizon during REINDEX.	Tom Lane
	There can never be a need to push the indcheckxmin horizon forward, since any HOT chains that are actually broken with respect to the index must pre-date its original creation. So we can just avoid changing pg_index altogether during a REINDEX operation. This offers a cleaner solution than my previous patch for the problem found a few days ago that we mustn't try to update pg_index while we are reindexing it. System catalog indexes will always be created with indcheckxmin = false during initdb, and with this modified code we should never try to change their pg_index entries. This avoids special-casing system catalogs as the former patch did, and should provide a performance benefit for many cases where REINDEX formerly caused an index to be considered unusable for a short time. Back-patch to 8.3 to cover all versions containing HOT. Note that this patch changes the API for index_build(), but I believe it is unlikely that any add-on code is calling that directly.
2011-04-18	Fix handling of collations in multi-row VALUES constructs.	Tom Lane
	Per spec we ought to apply select_common_collation() across the expressions in each column of the VALUES table. The original coding was just taking the first row and assuming it was representative. This patch adds a field to struct RangeTblEntry to carry the resolved collations, so initdb is forced for changes in stored rule representation.
2011-04-16	Simplify reindex_relation's API.	Tom Lane
	For what seem entirely historical reasons, a bitmask "flags" argument was recently added to reindex_relation without subsuming its existing boolean argument into that bitmask. This seems a bit bizarre, so fold them together.
2011-04-16	Clean up collation processing in prepunion.c.	Tom Lane
	This area was a few bricks shy of a load, and badly under-commented too. We have to ensure that the generated targetlist entries for a set-operation node expose the correct collation for each entry, since higher-level processing expects the tlist to reflect the true ordering of the plan's output. This hackery wouldn't be necessary if SortGroupClause carried collation info ... but making it do so would inject more pain in the parser than would be saved here. Still, we might want to rethink that sometime.
2011-04-12	Pass collations to functions in FunctionCallInfoData, not FmgrInfo.	Tom Lane
	Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
2011-04-11	Fix RI_Initial_Check to use a COLLATE clause when needed in its query.	Tom Lane
	If the referencing and referenced columns have different collations, the parser will be unable to resolve which collation to use unless it's helped out in this way. The effects are sometimes masked, if we end up using a non-collation-sensitive plan; but if we do use a mergejoin we'll see a failure, as recently noted by Robert Haas. The SQL spec states that the referenced column's collation should be used to resolve RI checks, so that's what we do. Note however that we currently don't append a COLLATE clause when writing a query that examines only the referencing column. If we ever support collations that have varying notions of equality, that will have to be changed. For the moment, though, it's preferable to leave it off so that we can use a normal index on the referencing column.
2011-04-11	Teach pattern_fixed_prefix() about collations.	Tom Lane
	This is necessary, not optional, now that ILIKE and regexes are collation aware --- else we might derive a wrong comparison constant for index optimized pattern matches.
2011-04-11	Fix the size of predicate lock manager's shared memory hash tables at creation.	Heikki Linnakangas
	This way they don't compete with the regular lock manager for the slack shared memory, making the behavior more predictable.
2011-04-10	Add some more mapping macros for Microsoft wide-character API.	Tom Lane
	Per buildfarm.
2011-04-10	Teach regular expression operators to honor collations.	Tom Lane
	This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.
2011-04-10	pgindent run before PG 9.1 beta 1.	Bruce Momjian

2011-04-10	Add collation support on Windows (MSVC build)	Peter Eisentraut
	There is not yet support in initdb to populate the pg_collation catalog, but if that is done manually, the rest should work.
2011-04-08	Avoid an unnecessary syscache lookup in parse_coerce.c.	Tom Lane
	All the other fields of the constant are being extracted from the syscache entry we already have, so handle collation similarly. (There don't seem to be any other uses for the new function at the moment.)
2011-04-07	Revise the API for GUC variable assign hooks.	Tom Lane
	The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
2011-04-05	Add casts from int4 and int8 to numeric.	Robert Haas
	Joey Adams, per gripe from Ramanujam. Review by myself and Tom Lane.
2011-04-04	Avoid assuming there will be only 3 states for synchronous_commit.	Simon Riggs
	Also avoid hardcoding the current default state by giving it the name "on" and replace with a meaningful name that reflects its behaviour. Coding only, no change in behaviour.
2011-04-04	Merge synchronous_replication setting into synchronous_commit.	Robert Haas
	This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.
2011-04-03	Rearrange "add column" logic to merge columns at exec time.	Robert Haas
	The previous coding set attinhcount too high in some cases, resulting in an undumpable, undroppable column. Per bug #5856, reported by Naoya Anzai. See also commit 31b6fc06d83c6de3644c8f2921eb7de0eb92fac3, which fixes a similar bug in ALTER TABLE .. ADD CONSTRAINT. Patch by Noah Misch.
2011-04-03	Avoid possible hang during smart shutdown.	Robert Haas
	If a smart shutdown occurs just as a child is starting up, and the child subsequently becomes a walsender, there is a race condition: the postmaster might count the exstant backends, determine that there is one normal backend, and wait for it to die off. Had the walsender transition already occurred before the postmaster counted, it would have proceeded with the shutdown. To fix this, have each child that transforms into a walsender kick the postmaster just after doing so, so that the state machine is certain to advance. Fujii Masao