summaryrefslogtreecommitdiff
path: root/src/include
AgeCommit message (Collapse)Author
2010-02-11Generic implementation of red-black binary tree. It's planned to use inTeodor Sigaev
several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.
2010-02-09Fix up rickety handling of relation-truncation interlocks.Tom Lane
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.
2010-02-08Create an official API function for C functions to use to check if they areTom Lane
being called as aggregates, and to get the aggregate transition state memory context if needed. Use it instead of poking directly into AggState and WindowAggState in places that shouldn't know so much. We should have done this in 8.4, probably, but better late than never. Revised version of a patch by Hitoshi Harada.
2010-02-08Add C comments that HEAP_MOVED_* define usage is only for pre-9.0 binaryBruce Momjian
upgrades.
2010-02-08Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that wasTom Lane
needed by nothing else. The restructuring I just finished doing on cache management exposed to me how silly this routine was. Its function was to go into the catcache and blow away all entries related to a given relation when there was a relcache flush on that relation. However, there is no point in removing a catcache entry if the catalog row it represents is still valid --- and if it isn't valid, there must have been a catcache entry flush on it, because that's triggered directly by heap_update or heap_delete on the catalog row. So this routine accomplished nothing except to blow away valid cache entries that we'd very likely be wanting in the near future to help reconstruct the relcache entry. Dumb. On top of which, it required a subtle and easy-to-get-wrong attribute in syscache definitions, ie, the column containing the OID of the related relation if any. Removing that is a very useful maintenance simplification.
2010-02-08Remove old-style VACUUM FULL (which was known for a little while asTom Lane
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
2010-02-07Work around deadlock problems with VACUUM FULL/CLUSTER on system catalogs,Tom Lane
as per my recent proposal. First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples to commit unless the index build is checking uniqueness/exclusion constraints. If it isn't, there's no harm in just indexing the in-doubt tuple. Second, modify VACUUM FULL/CLUSTER to suppress reverifying uniqueness/exclusion constraint properties while rebuilding indexes of the target relation. This is reasonable because these commands aren't meant to deal with corrupted-data situations. Constraint properties will still be rechecked when an index is rebuilt by a REINDEX command. This gets us out of the problem that new-style VACUUM FULL would often wait for other transactions while holding exclusive lock on a system catalog, leading to probable deadlock because those other transactions need to look at the catalogs too. Although the real ultimate cause of the problem is a debatable choice to release locks early after modifying system catalogs, changing that choice would require pretty serious analysis and is not something to be undertaken lightly or on a tight schedule. The present patch fixes the problem in a fairly reasonable way and should also improve the speed of VACUUM FULL/CLUSTER a little bit.
2010-02-07Create a "relation mapping" infrastructure to support changing the relfilenodesTom Lane
of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.
2010-02-04Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swappingTom Lane
of old and new toast tables can be done either at the logical level (by swapping the heaps' reltoastrelid links) or at the physical level (by swapping the relfilenodes of the toast tables and their indexes). This is necessary infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared system catalogs, where we cannot change reltoastrelid. The physical swap saves a few catalog updates too. We unfortunately have to keep the logical-level swap logic because in some cases we will be adding or deleting a toast table, so there's no possibility of a physical swap. However, that only happens as a consequence of schema changes in the table, which we do not need to support for system catalogs, so such cases aren't an obstacle for that. In passing, refactor the cluster support functions a little bit to eliminate unnecessarily-duplicated code; and fix the problem that while CLUSTER had been taught to rename the final toast table at need, ALTER TABLE had not.
2010-02-03Add a message type header to the CopyData messages sent from primaryHeikki Linnakangas
to standby in streaming replication. While we only have one message type at the moment, adding a message type header makes this easier to extend.
2010-02-03Assorted cleanups in preparation for using a map file to support alteringTom Lane
the relfilenode of currently-not-relocatable system catalogs. 1. Get rid of inval.c's dependency on relfilenode, by not having it emit smgr invalidations as a result of relcache flushes. Instead, smgr sinval messages are sent directly from smgr.c when an actual relation delete or truncate is done. This makes considerably more structural sense and allows elimination of a large number of useless smgr inval messages that were formerly sent even in cases where nothing was changing at the physical-relation level. Note that this reintroduces the concept of nontransactional inval messages, but that's okay --- because the messages are sent by smgr.c, they will be sent in Hot Standby slaves, just from a lower logical level than before. 2. Move setNewRelfilenode out of catalog/index.c, where it never logically belonged, into relcache.c; which is a somewhat debatable choice as well but better than before. (I considered catalog/storage.c, but that seemed too low level.) Rename to RelationSetNewRelfilenode. 3. Cosmetic cleanups of some other relfilenode manipulations.
2010-02-02Make RADIUS authentication use pg_getaddrinfo_all() to get address ofMagnus Hagander
the server. Gets rid of a fairly ugly hack for Solaris, and also provides hostname and IPV6 support.
2010-02-02Fold FindConversion() into FindConversionByName() and remove ACL check.Robert Haas
All callers of FindConversionByName() already do suitable permissions checking already apart from this function, but this is not just dead code removal: the unnecessary permissions check can actually lead to spurious failures - there's no reason why inability to execute the underlying function should prohibit renaming the conversion, for example. (The error messages in these cases were also rather poor: FindConversion would return InvalidOid, eventually leading to a complaint that the conversion "did not exist", which was not correct.) KaiGai Kohei
2010-02-01Tighten integrity checks on ALTER TABLE ... ALTER COLUMN ... RENAME.Robert Haas
When a column is renamed, we recursively rename the same column in all descendent tables. But if one of those tables also inherits that column from a table outside the inheritance hierarchy rooted at the named table, we must throw an error. The previous coding correctly prohibited the rename when the parent had inherited the column from elsewhere, but overlooked the case where the parent was OK but a child table also inherited the same column from a second, unrelated parent. For now, not backpatched due to lack of complaints from the field. KaiGai Kohei, with further changes by me. Reviewed by Bernd Helme and Tom Lane.
2010-02-01Augment EXPLAIN output with more details on Hash nodes.Robert Haas
We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.
2010-02-01Revoke augmentation of WAL records for btree delete, per discussion.Simon Riggs
2010-02-01Add string_agg aggregate functions. The one argument version concatenatesItagaki Takahiro
the input values into a string. The two argument version also does the same thing, but inserts delimiters between elements. Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
2010-01-31Detect early deadlock in Hot Standby when Startup is already waiting. FirstSimon Riggs
stage of required deadlock detection to allow re-enabling max_standby_delay setting of -1, which is now essential in the absence of improved relation- specific conflict resoluton. Requested by Greg Stark et al.
2010-01-31Parenthesize this macro, just in case.Tom Lane
2010-01-29Augment WAL records for btree delete with GetOldestXmin() to reduceSimon Riggs
false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on | off, with default off allows measurement of performance impact to see whether it should be set on all the time.
2010-01-29Filter recovery conflicts based upon dboid from relfilenode of WALSimon Riggs
records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.
2010-01-28Type table featurePeter Eisentraut
This adds the CREATE TABLE name OF type command, per SQL standard.
2010-01-28Add functions to reset the statistics counter for a single table/index orMagnus Hagander
a single function.
2010-01-28Define INADDR_NONE on Solaris when it's missing. Per a couple of buildfarmMagnus Hagander
members complaining.
2010-01-28Change a few remaining calls of XLogArchivingActive() to useHeikki Linnakangas
XLogIsNeeded() instead, to determine if an otherwise non-logged operation needs to be logged in WAL for standby servers. Fujii Masao
2010-01-27Make standby server continuously retry restoring the next WAL segment withHeikki Linnakangas
restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.
2010-01-27Add support for RADIUS authentication.Magnus Hagander
2010-01-26Remove the default_do_language parameter, instead making DO use a hardwiredTom Lane
default of "plpgsql". This is more reasonable than it was when the DO patch was written, because we have since decided that plpgsql should be installed by default. Per discussion, having a parameter for this doesn't seem useful enough to justify the risk of application breakage if the value is changed unexpectedly.
2010-01-25Add get_bit/set_bit functions for bit strings, paralleling those for bytea,Tom Lane
and implement OVERLAY() for bit strings and bytea. In passing also convert text OVERLAY() to a true built-in, instead of relying on a SQL function. Leonardo F, reviewed by Kevin Grittner
2010-01-23In HS, Startup process sets SIGALRM when waiting for buffer pin. IfSimon Riggs
woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.
2010-01-22Fix several oversights in previous commit - attribute options patch.Robert Haas
I failed to 'cvs add' the new files and also neglected to bump catversion.
2010-01-22Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.Robert Haas
Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.
2010-01-22PL/Python DO handlerPeter Eisentraut
Also cleaned up some redundancies between the primary error messages and the error context in PL/Python. Hannu Valtonen
2010-01-20Write a WAL record whenever we perform an operation without WAL-loggingHeikki Linnakangas
that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.
2010-01-20Now that much of walreceiver has been pulled back into the postgresHeikki Linnakangas
binary, revert PGDLLIMPORT decoration of global variables. I'm not sure if there's any real harm from unnecessary PGDLLIMPORTs, but these are all internal variables that external modules really shouldn't be messing with. ThisTimeLineID still needs PGDLLIMPORT.
2010-01-20Rethink the way walreceiver is linked into the backend. Instead than shovingHeikki Linnakangas
walreceiver as whole into a dynamically loaded module, split the libpq-specific parts of it into dynamically loaded module and keep the rest in the main backend binary. Although Tom fixed the Windows compilation problems with the old walreceiver module already, this is a cleaner division of labour and makes the code more readable. There's also the prospect of adding new transport methods as pluggable modules in the future, which this patch makes easier, though for now the API between libpqwalreceiver and walreceiver process should be considered private. The libpq-specific module is now in src/backend/replication/libpqwalreceiver, and the part linked with postgres binary is in src/backend/replication/walreceiver.c.
2010-01-19Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide sharedMagnus Hagander
statistics of the bgwriter. Greg Smith
2010-01-19Add pg_table_size() and pg_indexes_size() to provide more user-friendlyTom Lane
wrappers around the pg_relation_size() function. Bernd Helmle, reviewed by Greg Smith
2010-01-17Improve the handling of SET CONSTRAINTS commands by having them searchTom Lane
pg_constraint before searching pg_trigger. This allows saner handling of corner cases; in particular we now say "constraint is not deferrable" rather than "constraint does not exist" when the command is applied to a constraint that's inherently non-deferrable. Per a gripe several months ago from hubert depesz lubaczewski. To make this work without breaking user-defined constraint triggers, we have to add entries for them to pg_constraint. However, in return we can remove the pgconstrname column from pg_constraint, which represents a fairly sizable space savings. I also replaced the tgisconstraint column with tgisinternal; the old meaning of tgisconstraint can now be had by testing for nonzero tgconstraint, while there is no other way to get the old meaning of nonzero tgconstraint, namely that the trigger was internally generated rather than being user-created. In passing, fix an old misstatement in the docs and comments, namely that pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable. Actually, we mark RI action triggers as nondeferrable even when they belong to a nominally deferrable FK constraint. The SET CONSTRAINTS code now relies on that instead of hard-coding a list of exception OIDs.
2010-01-16Teach standby conflict resolution to use SIGUSR1Simon Riggs
Conflict reason is passed through directly to the backend, so we can take decisions about the effect of the conflict based upon the local state. No specific changes, as yet, though this prepares for later work. CancelVirtualTransaction() sends signals while holding ProcArrayLock. Introduce errdetail_abort() to give message detail explaining that the abort was caused by conflict processing. Remove CONFLICT_MODE states in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
2010-01-16Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants ofTom Lane
the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified. Per buildfarm results.
2010-01-16PGDLLIMPORT-ize the remaining variables needed by walreceiver.Tom Lane
2010-01-15Do parse analysis of an EXPLAIN's contained statement during the normalTom Lane
parse analysis phase, rather than at execution time. This makes parameter handling work the same as it does in ordinary plannable queries, and in particular fixes the incompatibility that Pavel pointed out with plpgsql's new handling of variable references. plancache.c gets a little bit grottier, but the alternatives seem worse.
2010-01-15Introduce Streaming Replication.Heikki Linnakangas
This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me
2010-01-14Add point_ops opclass for GiST.Teodor Sigaev
2010-01-14First part of refactoring of code for ResolveRecoveryConflict. PurposesSimon Riggs
of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.
2010-01-12Please tablespace directories in their own subdirectory so pg_migratorBruce Momjian
can upgrade clusters without renaming the tablespace directories. New directory structure format is, e.g.: $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
2010-01-10Some trivial adjustments in comments for struct RelationData.Tom Lane
2010-01-10During Hot Standby, fix drop database when sessions idle.Simon Riggs
Previously we only cancelled sessions that were in-transaction. Simple fix is to just cancel all sessions without waiting. Doing it this way avoids complicating common code paths, which would not be worth the trouble to cover this rare case. Problem report and fix by Andres Freund, edited somewhat by me
2010-01-10Create typedef pgsocket for storing socket descriptors.Magnus Hagander
This silences some warnings on Win64. Not using the proper SOCKET datatype was actually wrong on Win32 as well, but didn't cause any warnings there. Also create define PGINVALID_SOCKET to indicate an invalid/non-existing socket, instead of using a hardcoded -1 value.