user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2010-01-15	Introduce Streaming Replication.	Heikki Linnakangas
	This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me
2010-01-14	Add point_ops opclass for GiST.	Teodor Sigaev

2010-01-14	First part of refactoring of code for ResolveRecoveryConflict. Purposes	Simon Riggs
	of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.
2010-01-12	Please tablespace directories in their own subdirectory so pg_migrator	Bruce Momjian
	can upgrade clusters without renaming the tablespace directories. New directory structure format is, e.g.: $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
2010-01-10	Some trivial adjustments in comments for struct RelationData.	Tom Lane

2010-01-10	During Hot Standby, fix drop database when sessions idle.	Simon Riggs
	Previously we only cancelled sessions that were in-transaction. Simple fix is to just cancel all sessions without waiting. Doing it this way avoids complicating common code paths, which would not be worth the trouble to cover this rare case. Problem report and fix by Andres Freund, edited somewhat by me
2010-01-10	Create typedef pgsocket for storing socket descriptors.	Magnus Hagander
	This silences some warnings on Win64. Not using the proper SOCKET datatype was actually wrong on Win32 as well, but didn't cause any warnings there. Also create define PGINVALID_SOCKET to indicate an invalid/non-existing socket, instead of using a hardcoded -1 value.
2010-01-10	Remove partial, broken support for NULL pointers when fetching attributes.	Robert Haas
	Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to (void )0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.
2010-01-09	During Hot Standby, set DatabasePath correctly during relcache init file	Simon Riggs
	deletion, so that we attempt to unlink the correct filepath. unlink() errors are ignorable there, so lack of a DatabasePath initialization step did not cause visible problems until a related bug showed up on Solaris. Code refactored from xact_redo_commit() to ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay shared invalidation messages for many databases, so we cannot SetDatabasePath() once as we do in normal backends. Read the databaseid from the shared invalidation messages, then set DatabasePath temporarily before calling RelationCacheInitFileInvalidate(). Problem report by Robert Treat, analysis and fix by me.
2010-01-08	pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows.	Itagaki Takahiro

2010-01-07	Fix 3-parameter form of bit substring() to throw error for negative length,	Tom Lane
	as required by SQL standard.
2010-01-07	Remove all the special-case code for INT64_IS_BUSTED, per decision that	Tom Lane
	we're not going to support that anymore. I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a performance excuse to live. It's a bit moot since that's all ifdef'd out, of course.
2010-01-07	Further fixes for per-tablespace options patch.	Robert Haas
	Add missing varlena header to TableSpaceOpts structure. And, per Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext, call it in the caller's memory context and copy the value over afterwards, to reduce the chances of a session-lifetime memory leak.
2010-01-07	Alter the configure script to fail immediately if the C compiler does not	Tom Lane
	provide a working 64-bit integer datatype. As recently noted, we've been broken on such platforms since early in the 8.4 development cycle. Since it took nearly two years for anyone to even notice, it seems that the rationale for continuing to support such platforms has reached the point of non-existence. Rather than thrashing around to try to make it work again, we'll just admit up front that this no longer works. Back-patch to 8.4 since that branch is also broken. We should go around to remove INT64_IS_BUSTED support, but just in HEAD, so that seems like material for a separate commit.
2010-01-06	Support rewritten-based full vacuum as VACUUM FULL. Traditional	Itagaki Takahiro
	VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming. Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot use it for system tables. VACUUM FULL for system tables always fall back into VACUUM FULL INPLACE silently. Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.
2010-01-06	binary upgrade:	Bruce Momjian
	Preserve relfilenodes for views and composite types --- even though we don't store data in, them, they do consume relfilenodes. Bump catalog version.
2010-01-06	Update catalog version for recent relfilenode patch, so pg_migrator can	Bruce Momjian
	identify the new API.
2010-01-06	Preserve relfilenodes:	Bruce Momjian
	Add support to pg_dump --binary-upgrade to preserve all relfilenodes, for use by pg_migrator.
2010-01-06	Remove tabs in SGML.	Bruce Momjian
	Move OIDCHARS to proper include file.
2010-01-05	Add support for doing FULL JOIN ON FALSE. While this is really a rather	Tom Lane
	peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.
2010-01-05	Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).	Robert Haas
	This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05	Use _mm_pause() for win64 spin_delay(), per note from Tsutomu Yamada.	Magnus Hagander

2010-01-05	Get rid of the need for manual maintenance of the initial contents of	Tom Lane
	pg_attribute, by having genbki.pl derive the information from the various catalog header files. This greatly simplifies modification of the "bootstrapped" catalogs. This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on Perl scripts for those build steps. To avoid creating a Perl build dependency where there was not one before, the output files generated by these scripts are now treated as distprep targets, ie, they will be built and shipped in tarballs. But you will need a reasonably modern Perl (probably at least 5.6) if you want to build from a CVS pull. The changes to the MSVC build process are untested, and may well break --- we'll soon find out from the buildfarm. John Naylor, based on ideas from Robert Haas and others
2010-01-04	Add a Win64-specific spin_delay() function.	Magnus Hagander
	We can't use the same as before, since MSVC on Win64 doesn't support inline assembly.
2010-01-04	Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at	Heikki Linnakangas
	recovery instead of reading the backup history file. This is more robust, as it stops you from prematurely starting up an inconsisten cluster if the backup history file is lost for some reason, or if the base backup was never finished with pg_stop_backup(). This also paves the way for a simpler streaming replication patch, which doesn't need to care about backup history files anymore. The backup history file is still created and archived as before, but it's not used by the system anymore. It's just for informational purposes now. Bump PG_CONTROL_VERSION as the location of the backup startpoint is now written to a new field in pg_control, and catversion because initdb is required Original patch by Fujii Masao per Simon's idea, with further fixes by me.
2010-01-04	When estimating the selectivity of an inequality "column > constant" or	Tom Lane
	"column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.
2010-01-02	Make ssize_t 64-bit on Win64, for compatibility with for example plpython.	Magnus Hagander

2010-01-02	Update copyright for the year 2010.	Bruce Momjian

2010-01-02	Set proper sizes for size_t and void* on 64-bit Windows builds.	Magnus Hagander
	Tsutomu Yamada
2010-01-02	Support 64-bit shared memory when building on 64-bit Windows.	Magnus Hagander
	Tsutomu Yamada
2010-01-01	Add an "argisrow" field to NullTest nodes, following a plan made way back in	Tom Lane
	8.2beta but never carried out. This avoids repetitive tests of whether the argument is of scalar or composite type. Also, be a bit more paranoid about composite arguments in some places where we previously weren't checking.
2010-01-01	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out	Tom Lane
	to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.
2009-12-31	Redefine Datum as uintptr_t, instead of unsigned long.	Tom Lane
	This is more in keeping with modern practice, and is a first step towards porting to Win64 (which has sizeof(pointer) > sizeof(long)). Tsutomu Yamada, Magnus Hagander, Tom Lane
2009-12-30	Revise pgstat's tracking of tuple changes to improve the reliability of	Tom Lane
	decisions about when to auto-analyze. The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples, where all three of these numbers could be bad estimates from ANALYZE itself. Even worse, in the presence of a steady flow of HOT updates and matching HOT-tuple reclamations, auto-analyze might never trigger at all, even if all three numbers are exactly right, because n_dead_tuples could hold steady. To fix, replace last_anl_tuples with an accurately tracked count of the total number of committed tuple inserts + updates + deletes since the last ANALYZE on the table. This can still be compared to the same threshold as before, but it's much more trustworthy than the old computation. Tracking this requires one more intra-transaction counter per modified table within backends, but no additional memory space in the stats collector. There probably isn't any measurable speed difference; if anything it might be a bit faster than before, since I was able to eliminate some per-tuple arithmetic operations in favor of adding sums once per (sub)transaction. Also, simplify the logic around pgstat vacuum and analyze reporting messages by not trying to fold VACUUM ANALYZE into a single pgstat message. The original thought behind this patch was to allow scheduling of analyzes on parent tables by artificially inflating their changes_since_analyze count. I've left that for a separate patch since this change seems to stand on its own merit.
2009-12-29	Add an index on pg_inherits.inhparent, and use it to avoid seqscans in	Tom Lane
	find_inheritance_children(). This is a complete no-op in databases without any inheritance. In databases where there are just a few entries in pg_inherits, it could conceivably be a small loss. However, in databases with many inheritance parents, it can be a big win.
2009-12-29	Add the ability to store inheritance-tree statistics in pg_statistic,	Tom Lane
	and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.
2009-12-27	Add backend and pg_dump code to allow preservation of pg_enum oids, for	Bruce Momjian
	use in binary upgrades. Bump catalog version for detection by pg_migrator of new backend API.
2009-12-24	Binary upgrade:	Bruce Momjian
	Modify pg_dump --binary-upgrade and add backend support routines to support the preservation of pg_type oids when doing a binary upgrade. This allows user-defined composite types and arrays to be binary upgraded.
2009-12-23	Allow the index name to be omitted in CREATE INDEX, causing the system to	Tom Lane
	choose an index name the same as it would do for an unnamed index constraint. (My recent changes to the index naming logic have helped to ensure that this will be a reasonable choice.) Per a suggestion from Peter. A necessary side-effect is to promote CONCURRENTLY to type_func_name_keyword status, ie, it can't be a table/column/index name anymore unless quoted. This is not all bad, since we have heard more than once of people typing CREATE INDEX CONCURRENTLY ON foo (...) and getting a normal index build of an index named "concurrently", which was not what they wanted. Now this syntax will result in a concurrent build of an index with system-chosen name; which they can rename afterwards if they want something else.
2009-12-23	Adjust naming of indexes and their columns per recent discussion.	Tom Lane
	Index expression columns are now named after the FigureColname result for their expressions, rather than always being "pg_expression_N". Digits are appended to this name if needed to make the column name unique within the index. (That happens for regular columns too, thus fixing the old problem that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes there was no real reason to do such a thing, but now maybe there is.) Default names for indexes and associated constraints now include the column names of all their columns, not only the first one as in previous practice. (Of course, this will be truncated as needed to fit in NAMEDATALEN. Also, pkey indexes retain the historical behavior of not naming specific columns at all.) An example of the results: regression=# create table foo (f1 int, f2 text, regression(# exclude (f1 with =, lower(f2) with =)); NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo" CREATE TABLE regression=# \d foo_f1_lower_exclusion Index "public.foo_f1_lower_exclusion" Column \| Type \| Definition --------+---------+------------ f1 \| integer \| f1 lower \| text \| lower(f2) btree, for table "public.foo"
2009-12-19	Bump catversion to reflect the fact that HS patch changed pg_proc	Tom Lane
	contents, and PG_CONTROL_VERSION to reflect the fact that it changed pg_control contents. (I see we did at least remember to change XLOG_PAGE_MAGIC for the WAL contents changes.)
2009-12-19	Allow read only connections during recovery, known as Hot Standby.	Simon Riggs
	Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-16	Don't unblock SIGQUIT in the SIGQUIT handler	Peter Eisentraut
	This was possibly linked to a deadlock-like situation in glibc syslog code invoked by the ereport call in quickdie(). In any case, a signal handler should not unblock its own signal unless there is a specific reason to.
2009-12-16	If there is no sigdelset(), define it as a macro.	Peter Eisentraut
	This removes some duplicate code that recreated the identical workaround when the newer signal API is missing.
2009-12-15	Python 3 support in PL/Python	Peter Eisentraut
	Behaves more or less unchanged compared to Python 2, but the new language variant is called plpython3u. Documentation describing the naming scheme is included.
2009-12-15	Add a hook to let loadable modules get control at ProcessUtility execution,	Tom Lane
	and use it to extend contrib/pg_stat_statements to track utility commands. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15	Support ORDER BY within aggregate function calls, at long last providing a	Tom Lane
	non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada
2009-12-15	Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.	Robert Haas
	This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-14	Fix a bug introduced when set-returning SQL functions were made inline-able:	Tom Lane
	we have to cope with the possibility that the declared result rowtype contains dropped columns. This fails in 8.4, as per bug #5240. While at it, be more paranoid about inserting binary coercions when inlining. The pre-8.4 code did not really need to worry about that because it could not inline at all in any case where an added coercion could change the behavior of the function's statement. However, when inlining a SRF we allow sorting, grouping, and set-ops such as UNION. In these cases, modifying one of the targetlist entries that the sort/group/setop depends on could conceivably change the behavior of the function's statement --- so don't inline when such a case applies.
2009-12-12	Allow LDAP authentication to operate in search+bind mode, meaning it	Magnus Hagander
	does a search for the user in the directory first, and then binds with the DN found for this user. This allows for LDAP logins in scenarios where the DN of the user cannot be determined simply by prefix and suffix, such as the case where different users are located in different containers. The old way of authentication can be significantly faster, so it's kept as an option. Robert Fleming and Magnus Hagander