user/sven/postgresql.git

Age	Commit message (Collapse)	Author
2012-01-12	Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.	Tom Lane
	In commit 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.
2012-01-07	Use __sync_lock_test_and_set() for spinlocks on ARM, if available.	Tom Lane
	Historically we've used the SWPB instruction for TAS() on ARM, but this is deprecated and not available on ARMv6 and later. Instead, make use of a GCC builtin if available. We'll still fall back to SWPB if not, so as not to break existing ports using older GCC versions. Eventually we might want to try using __sync_lock_test_and_set() on some other architectures too, but for now that seems to present only risk and not reward. Back-patch to all supported versions, since people might want to use any of them on more recent ARM chips. Martin Pitt
2011-12-12	Revert the behavior of inet/cidr functions to not unpack the arguments.	Heikki Linnakangas
	I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.
2011-12-01	Stamp 9.1.2.REL9_1_2	Tom Lane

2011-11-27	Ensure that whole-row junk Vars are always of composite type.	Tom Lane
	The EvalPlanQual machinery assumes that whole-row Vars generated for the outputs of non-table RTEs will be of composite types. However, for the case where the RTE is a function call returning a scalar type, we were doing the wrong thing, as a result of sharing code with a parser case where the function's scalar output is wanted. (Or at least, that's what that case has done historically; it does seem a bit inconsistent.) To fix, extend makeWholeRowVar's API so that it can support both use-cases. This fixes Belinda Cussen's report of crashes during concurrent execution of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED mode, we'd run the EvalPlanQual machinery after a conflicting row update commits, and it was expecting to get a HeapTuple not a scalar datum from the "wholerowN" variable referencing the function RTE. Back-patch to 9.0 where the current EvalPlanQual implementation appeared. In 9.1 and up, this patch also fixes failure to attach the correct collation to the Var generated for a scalar-result case. An example: regression=# select upper(x.*) from textcat('ab', 'cd') x; ERROR: could not determine which collation to use for upper() function
2011-11-10	Fix server header file installation with vpath builds	Peter Eisentraut
	Several server header files would not be installed in vpath builds because they live in the build directory.
2011-11-08	Wrap appendrel member outputs in PlaceHolderVars in additional cases.	Tom Lane
	Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output expressions appear non-constant and distinct from each other. This makes the world safe for add_child_rel_equivalences to do what it does. Before, it was possible for that function to add identical expressions to different EquivalenceClasses, which logically should imply merging such ECs, which would be wrong; or to improperly add a constant to an EquivalenceClass, drastically changing its behavior. Per report from Teodor Sigaev. The only currently known consequence of this bug is "MergeAppend child's targetlist doesn't match MergeAppend" planner failures in 9.1 and later. I am suspicious that there may be other failure modes that could affect older release branches; but in the absence of any hard evidence, I'll refrain from back-patching further than 9.1.
2011-11-08	Make DatumGetInetP() unpack inet datums with a 1-byte header, and add	Heikki Linnakangas
	a new macro, DatumGetInetPP(), that does not. This brings these macros in line with other DatumGet*P() macros. Backpatch to 8.3, where 1-byte header varlenas were introduced.
2011-11-03	Fix handling of PlaceHolderVars in nestloop parameter management.	Tom Lane
	If we use a PlaceHolderVar from the outer relation in an inner indexscan, we need to reference the PlaceHolderVar as such as the value to be passed in from the outer relation. The previous code effectively tried to reconstruct the PHV from its component expression, which doesn't work since (a) the Vars therein aren't necessarily bubbled up far enough, and (b) it would be the wrong semantics anyway because of the possibility that the PHV is supposed to have gone to null at some point before the current join. Point (a) led to "variable not found in subplan target list" planner errors, but point (b) would have led to silently wrong answers. Per report from Roger Niederland.
2011-11-02	Derive oldestActiveXid at correct time for Hot Standby.	Simon Riggs
	There was a timing window between when oldestActiveXid was derived and when it should have been derived that only shows itself under heavy load. Move code around to ensure correct timing of derivation. No change to StartupSUBTRANS() code, which is where this failed. Bug report by Chris Redekop
2011-11-02	Fix timing of Startup CLOG and MultiXact during Hot Standby	Simon Riggs
	Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
2011-11-01	Fix race condition with toast table access from a stale syscache entry.	Tom Lane
	If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.
2011-10-23	Don't trust deferred-unique indexes for join removal.	Tom Lane
	The uniqueness condition might fail to hold intra-transaction, and assuming it does can give incorrect query results. Per report from Marti Raudsepp, though this is not his proposed patch. Back-patch to 9.0, where both these features were introduced. In the released branches, add the new IndexOptInfo field to the end of the struct, to try to minimize ABI breakage for third-party code that may be examining that struct.
2011-10-09	Revert accidental change to pg_config_manual.h.	Robert Haas
	This was broken in commit 53dbc27c62d8e1b6c5253feba04a5094cb8fe046, which introduced unlogged tables. Fortunately, as debugging tools go, this one is pretty cheap, which is probably why it took nine months for someone to notice, but it's not intended to be enabled by default, so revert. Noted by Fujii Masao.
2011-10-05	Improve and simplify CREATE EXTENSION's management of GUC variables.	Tom Lane
	CREATE EXTENSION needs to transiently set search_path, as well as client_min_messages and log_min_messages. We were doing this by the expedient of saving the current string value of each variable, doing a SET LOCAL, and then doing another SET LOCAL with the previous value at the end of the command. This is a bit expensive though, and it also fails badly if there is anything funny about the existing search_path value, as seen in a recent report from Roger Niederland. Fortunately, there's a much better way, which is to piggyback on the GUC infrastructure previously developed for functions with SET options. We just open a new GUC nesting level, do our assignments with GUC_ACTION_SAVE, and then close the nesting level when done. This automatically restores the prior settings without a re-parsing pass, so (in principle anyway) there can't be an error. And guc.c still takes care of cleanup in event of an error abort. The CREATE EXTENSION code for this was modeled on some much older code in ri_triggers.c, which I also changed to use the better method, even though there wasn't really much risk of failure there. Also improve the comments in guc.c to reflect this additional usage.
2011-09-22	Stamp 9.1.1.REL9_1_1	Tom Lane

2011-09-08	Stamp 9.1.0.REL9_1_0	Tom Lane

2011-09-04	Fix #include problems in 9.1 branch.	Tom Lane
	Remove unnecessary and circular #include of syncrep.h from proc.h. Add htup.h to tablecmds.h so it will compile without prerequisites.
2011-09-01	setlocale() on Windows doesn't work correctly if the locale name contains	Heikki Linnakangas
	dots. I previously worked around this in initdb, mapping the known problematic locale names to aliases that work, but Hiroshi Inoue pointed out that that's not enough because even if you use one of the aliases, like "Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie. "Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by passing that value back to setlocale(), it fails. Note that you are affected by this bug also if you use one of those short-form names manually, so just reverting the hack in initdb won't fix it. To work around that, move the locale name mapping from initdb to a wrapper around setlocale(), so that the mapping is invoked on every setlocale() call. Also, add a few checks for failed setlocale() calls in the backend. These calls shouldn't fail, and if they do there isn't much we can do about it, but at least you'll get a warning. Backpatch to 9.1, where the initdb hack was introduced. The Windows bug affects older versions too if you set locale manually to one of the aliases, but given the lack of complaints from the field, I'm hesitent to backpatch.
2011-09-01	Move the line to undefine setlocale() macro on Win32 outside USE_REPL_SNPRINTF	Heikki Linnakangas
	ifdef block. It has nothing to do with whether the replacement snprintf function is used. It caused no live bug, because the replacement snprintf function is always used on Win32, but it was nevertheless misplaced.
2011-08-21	Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.	Tom Lane
	Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-18	Tag 9.1rc1.REL9_1_RC1	Tom Lane

2011-08-16	Fix race condition in relcache init file invalidation.	Tom Lane
	The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.
2011-08-10	Back-patch assorted latch-related fixes.	Tom Lane
	Fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. While at it, rename the PGPROC latch field to procLatch for consistency with 9.2. Issues noted while reviewing a patch by Peter Geoghegan.
2011-08-09	Documentation improvement and minor code cleanups for the latch facility.	Tom Lane
	Improve the documentation around weak-memory-ordering risks, and do a pass of general editorialization on the comments in the latch code. Make the Windows latch code more like the Unix latch code where feasible; in particular provide the same Assert checks in both implementations. Fix poorly-placed WaitLatch call in syncrep.c. This patch resolves, for the moment, concerns around weak-memory-ordering bugs in latch-related code: we have documented the restrictions and checked that existing calls meet them. In 9.2 I hope that we will install suitable memory barrier instructions in SetLatch/ResetLatch, so that their callers don't need to be quite so careful.
2011-08-09	Fix nested PlaceHolderVar expressions that appear only in targetlists.	Tom Lane
	A PlaceHolderVar's expression might contain another, lower-level PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one certainly will be also, and so we have to make sure that both of them get into the placeholder_list with correct ph_may_need values during the initial pre-scan of the query (before deconstruct_jointree starts). We did this correctly for PlaceHolderVars appearing in the query quals, but overlooked the issue for those appearing in the top-level targetlist; with the result that nested placeholders referenced only in the targetlist did not work correctly, as illustrated in bug #6154. While at it, add some error checking to find_placeholder_info to ensure that we don't try to create new placeholders after it's too late to do so; they have to all be created before deconstruct_jointree starts. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
2011-08-06	Clean up ill-advised attempt to invent a private set of Node tags.	Tom Lane
	Somebody thought it'd be cute to invent a set of Node tag numbers that were defined independently of, and indeed conflicting with, the main tag-number list. While this accidentally failed to fail so far, it would certainly lead to trouble as soon as anyone wanted to, say, apply copyObject to these node types. Clang was already complaining about the use of makeNode on these tags, and I think quite rightly so. Fix by pushing these node definitions into the mainstream, including putting replnodes.h where it belongs.
2011-08-02	Move CheckRecoveryConflictDeadlock() call to a safer place.	Tom Lane
	This kluge was inserted in a spot apparently chosen at random: the lock manager's state is not yet fully set up for the wait, and in particular LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock will not get cleaned up if the ereport is thrown. This seems to not cause any observable problem in trivial test cases, because LockReleaseAll will silently clean up the debris; but I was able to cause failures with tests involving subtransactions. Fixes breakage induced by commit c85c941470efc44494fd7a5f426ee85fc65c268c. Back-patch to all affected branches.
2011-08-02	Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId.	Tom Lane
	It was initialized in the wrong place and to the wrong value. With bad luck this could result in incorrect query-cancellation failures in hot standby sessions, should a HS backend be holding pin on buffer number 1 while trying to acquire a lock.
2011-07-23	Rethink behavior of CREATE OR REPLACE during CREATE EXTENSION.	Tom Lane
	The original implementation simply did nothing when replacing an existing object during CREATE EXTENSION. The folly of this was exposed by a report from Marc Munro: if the existing object belongs to another extension, we are left in an inconsistent state. We should insist that the object does not belong to another extension, and then add it to the current extension if not already a member.
2011-07-16	Add an errdetail_internal() ereport auxiliary routine.	Tom Lane
	This function supports untranslated detail messages, in the same way that errmsg_internal supports untranslated primary messages. We've needed this for some time IMO, but discussion of some cases in the SSI code provided the impetus to actually add it. Kevin Grittner, with minor adjustments by me
2011-07-12	Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.	Tom Lane
	Regular aggregate functions in combination with, or within the arguments of, window functions are OK per spec; they have the semantics that the aggregate output rows are computed and then we run the window functions over that row set. (Thus, this combination is not really useful unless there's a GROUP BY so that more than one aggregate output row is possible.) The case without GROUP BY could fail, as recently reported by Jeff Davis, because sloppy construction of the Agg node's targetlist resulted in extra references to possibly-ungrouped Vars appearing outside the aggregate function calls themselves. See the added regression test case for an example. Fixing this requires modifying the API of flatten_tlist and its underlying function pull_var_clause. I chose to make pull_var_clause's API for aggregates identical to what it was already doing for placeholders, since the useful behaviors turn out to be the same (error, report node as-is, or recurse into it). I also tightened the error checking in this area a bit: if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here, that was a long time ago, so complain instead of ignoring them. Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing that it only occurs in a basically-useless corner case, it doesn't seem worth the risks of changing a function API in a minor release. There might be third-party code using pull_var_clause.
2011-07-08	Fix another oversight in logging of changes in postgresql.conf settings.	Tom Lane
	We were using GetConfigOption to collect the old value of each setting, overlooking the possibility that it didn't exist yet. This does happen in the case of adding a new entry within a custom variable class, as exhibited in bug #6097 from Maxim Boguk. To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1 and HEAD --- it seems possible that some third-party code is using that function, so changing its API in a minor release would cause problems. In 9.0, create a near-duplicate function instead.
2011-07-07	Tag 9.1beta3.REL9_1_BETA3	Tom Lane

2011-07-07	SSI has a race condition, where the order of commit sequence numbers of	Heikki Linnakangas
	transactions might not match the order the work done in those transactions become visible to others. The logic in SSI, however, assumed that it does. Fix that by having two sequence numbers for each serializable transaction, one taken before a transaction becomes visible to others, and one after it. This is easier than trying to make the the transition totally atomic, which would require holding ProcArrayLock and SerializableXactHashLock at the same time. By using prepareSeqNo instead of commitSeqNo in a few places where commit sequence numbers are compared, we can make those comparisons err on the safe side when we don't know for sure which committed first. Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it is different from the original patch.
2011-07-07	Reclassify replication-related GUC variables as "master" and "standby".	Tom Lane
	Per discussion, this structure seems more understandable than what was there before. Make config.sgml and postgresql.conf.sample agree. In passing do a bit of editorial work on the variable descriptions.
2011-07-06	Remove assumptions that not-equals operators cannot be in any opclass.	Tom Lane
	get_op_btree_interpretation assumed this in order to save some duplication of code, but it's not true in general anymore because we added <> support to btree_gist. (We still assume it for btree opclasses, though.) Also, essentially the same logic was baked into predtest.c. Get rid of that duplication by generalizing get_op_btree_interpretation so that it can be used by predtest.c. Per bug report from Denis de Bernardy and investigation by Jeff Davis, though I didn't use Jeff's patch exactly as-is. Back-patch to 9.1; we do not support this usage before that.
2011-07-03	Fix bugs in relpersistence handling during table creation.	Robert Haas
	Unlike the relistemp field which it replaced, relpersistence must be set correctly quite early during the table creation process, as we rely on it quite early on for a number of purposes, including security checks. Normally, this is set based on whether the user enters CREATE TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a relation may also be made implicitly temporary by creating it in pg_temp. This patch fixes the handling of that case, and also disables creation of unlogged tables in temporary tablespace (such table indeed skip WAL-logging, but we reject an explicit specification) and creation of relations in the temporary schemas of other sessions (which is not very sensible, and didn't work right anyway). Report by Amit Khandekar.
2011-06-29	Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's	Heikki Linnakangas
	more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me
2011-06-22	Remove pointless const qualifiers from function arguments in the SSI code.	Heikki Linnakangas
	As Tom Lane pointed out, "const Relation foo" doesn't guarantee that you can't modify the data the "foo" pointer points to. It just means that you can't change the pointer to point to something else within the function, which is not very useful.
2011-06-21	Fix bug introduced by recent SSI patch to merge ROLLED_BACK and	Heikki Linnakangas
	MARKED_FOR_DEATH flags into one. We still need the ROLLED_BACK flag to mark transactions that are in the process of being rolled back. To be precise, ROLLED_BACK now means that a transaction has already been discounted from the count of transactions with the oldest xmin, but not yet removed from the list of active transactions. Dan Ports
2011-06-16	pgindent run of recent SSI changes. Also, remove an unnecessary #include.	Heikki Linnakangas
	Kevin Grittner
2011-06-15	Rework parsing of ConstraintAttributeSpec to improve NOT VALID handling.	Tom Lane
	The initial commit of the ALTER TABLE ADD FOREIGN KEY NOT VALID feature failed to support labeling such constraints as deferrable. The best fix for this seems to be to fold NOT VALID into ConstraintAttributeSpec. That's a bit more general than the documented syntax, but it allows better-targeted syntax error messages. In addition, do some mostly-but-not-entirely-cosmetic code review for the whole NOT VALID patch.
2011-06-15	The rolled-back flag on serializable xacts was pointless and redundant with	Heikki Linnakangas
	the marked-for-death flag. It was only set for a fleeting moment while a transaction was being cleaned up at rollback. All the places that checked for the rolled-back flag should also check the marked-for-death flag, as both flags mean that the transaction will roll back. I also renamed the marked-for-death into "doomed", which is a lot shorter name.
2011-06-15	Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC	Heikki Linnakangas
	snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
2011-06-14	Renumber 2PC resource managers so that compared to 9.0, predicate lock rmgr	Heikki Linnakangas
	is added to the end, and existing resource managers keep their old ids. We're not going to guarantee on-disk compatibility for 2PC state files over major releases, but it seems better to avoid changing the ids them anyway. It will help anyone who might want to write external tools to inspect the state files to work with files from different versions, if nothing else. Per complaint from Tom Lane.
2011-06-10	Work around gcc 4.6.0 bug that breaks WAL replay.	Tom Lane
	ReadRecord's habit of using both direct references to tmpRecPtr and references to *RecPtr (which is pointing at tmpRecPtr) triggers an optimization bug in gcc 4.6.0, which apparently has forgotten about aliasing rules. Avoid the compiler bug, and make the code more readable to boot, by getting rid of the direct references. Improve the comments while at it. Back-patch to all supported versions, in case they get built with 4.6.0. Tom Lane, with some cosmetic suggestions from Alex Hunsaker
2011-06-10	Fix locking while setting flags in MySerializableXact.	Heikki Linnakangas
	Even if a flag is modified only by the backend owning the transaction, it's not safe to modify it without a lock. Another backend might be setting or clearing a different flag in the flags field concurrently, and that operation might be lost because setting or clearing a bit in a word is not atomic. Make did-write flag a simple backend-private boolean variable, because it was only set or tested in the owning backend (except when committing a prepared transaction, but it's not worthwhile to optimize for the case of a read-only prepared transaction). This also eliminates the need to add locking where that flag is set. Also, set the did-write flag when doing DDL operations like DROP TABLE or TRUNCATE -- that was missed earlier.
2011-06-10	Use "transient" files for blind writes, take 2	Alvaro Herrera
	"Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.) This commit fixes a bug in the previous one, which neglected to cleanly handle the LRU ring that fd.c uses to manage open files, and caused an unacceptable failure just before beta2 and was thus reverted.
2011-06-10	Small comment fixes and enhancements.	Heikki Linnakangas