summaryrefslogtreecommitdiff
path: root/src/backend/commands
AgeCommit message (Collapse)Author
2010-11-09Repair memory leakage while ANALYZE-ing complex index expressions.Tom Lane
The general design of memory management in Postgres is that intermediate results computed by an expression are not freed until the end of the tuple cycle. For expression indexes, ANALYZE has to re-evaluate each expression for each of its sample rows, and it wasn't bothering to free intermediate results until the end of processing of that index. This could lead to very substantial leakage if the intermediate results were large, as in a recent example from Jakub Ouhrabka. Fix by doing ResetExprContext for each sample row. This necessitates adding a datumCopy step to ensure that the final expression value isn't recycled too. Some quick testing suggests that this change adds at worst about 10% to the time needed to analyze a table with an expression index; which is annoying, but seems a tolerable price to pay to avoid unexpected out-of-memory problems. Back-patch to all supported branches.
2010-09-23Prevent show_session_authorization from crashing when session_authorizationTom Lane
hasn't been set. The only known case where this can happen is when show_session_authorization is invoked in an autovacuum process, which is possible if an index function calls it, as for example in bug #5669 from Andrew Geery. We could perhaps try to return a sensible value, such as the name of the cluster-owning superuser; but that seems like much more trouble than the case is worth, and in any case it could create new possible failure modes. Simply returning an empty string seems like the most appropriate fix. Back-patch to all supported versions, even those before autovacuum, just in case there's another way to provoke this crash.
2010-07-29Fix another longstanding problem in copy_relation_data: it was blithelyTom Lane
assuming that a local char[] array would be aligned on at least a word boundary. There are architectures on which that is pretty much guaranteed to NOT be the case ... and those arches also don't like non-aligned memory accesses, meaning that log_newpage() would crash if it ever got invoked. Even on Intel-ish machines there's a potential for a large performance penalty from doing I/O to an inadequately aligned buffer. So palloc it instead. Backpatch to 8.0 --- 7.4 doesn't have this code.
2010-07-29Fix possible page corruption by ALTER TABLE .. SET TABLESPACE.Robert Haas
If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage() will do the same thing during replay, so the corruption propagates to any standby. Note, however, that the bug can't be demonstrated unless archiving is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI are not set. Back-patch to 8.0; prior releases do not have tablespaces. Analysis and patch by Jeff Davis. Adjustments for back-branches and minor wordsmithing by me.
2010-07-01Allow ALTER TABLE .. SET TABLESPACE to be interrupted.Robert Haas
Backpatch to 8.0, where tablespaces were introduced. Guillaume Lelarge
2010-03-25Prevent ALTER USER f RESET ALL from removing the settings that were put thereAlvaro Herrera
by a superuser -- "ALTER USER f RESET setting" already disallows removing such a setting. Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database owner that's not superuser.
2010-01-24Fix assorted core dumps and Assert failures that could occur duringTom Lane
AbortTransaction or AbortSubTransaction, when trying to clean up after an error that prevented (sub)transaction start from completing: * access to TopTransactionResourceOwner that might not exist * assert failure in AtEOXact_GUC, if AtStart_GUC not called yet * assert failure or core dump in AfterTriggerEndSubXact, if AfterTriggerBeginSubXact not called yet Per testing by injecting elog(ERROR) at successive steps in StartTransaction and StartSubTransaction. It's not clear whether all of these cases could really occur in the field, but at least one of them is easily exposed by simple stress testing, as per my accidental discovery yesterday.
2009-12-29Previous fix for temporary file management broke returning a set fromHeikki Linnakangas
PL/pgSQL function within an exception handler. Make sure we use the right resource owner when we create the tuplestore to hold returned tuples. Simplify tuplestore API so that the caller doesn't need to be in the right memory context when calling tuplestore_put* functions. tuplestore.c automatically switches to the memory context used when the tuplestore was created. Tuplesort was already modified like this earlier. This patch also removes the now useless MemoryContextSwitch calls from callers. Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like the previous patch that broke this.
2009-12-09Prevent indirect security attacks via changing session-local state withinTom Lane
an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136
2009-11-10Fix longstanding problems in VACUUM caused by untimely interruptionsAlvaro Herrera
In VACUUM FULL, an interrupt after the initial transaction has been recorded as committed can cause postmaster to restart with the following error message: PANIC: cannot abort transaction NNNN, it was already committed This problem has been reported many times. In lazy VACUUM, an interrupt after the table has been truncated by lazy_truncate_heap causes other backends' relcache to still point to the removed pages; this can cause future INSERT and UPDATE queries to error out with the following error message: could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes The window to this race condition is extremely narrow, but it has been seen in the wild involving a cancelled autovacuum process. The solution for both problems is to inhibit interrupts in both operations until after the respective transactions have been committed. It's not a complete solution, because the transaction could theoretically be aborted by some other error, but at least fixes the most common causes of both problems.
2009-10-27Fix AfterTriggerSaveEvent to use a test and elog, not just Assert, to checkTom Lane
that it's called within an AfterTriggerBeginQuery/AfterTriggerEndQuery pair. The RI cascade triggers suppress that overhead on the assumption that they are always run non-deferred, so it's possible to violate the condition if someone mistakenly changes pg_trigger to mark such a trigger deferred. We don't really care about supporting that, but throwing an error instead of crashing seems desirable. Per report from Marcelo Costa.
2009-09-03Disallow RESET ROLE and RESET SESSION AUTHORIZATION inside security-definerTom Lane
functions. This extends the previous patch that forbade SETting these variables inside security-definer functions. RESET is equally a security hole, since it would allow regaining privileges of the caller; furthermore it can trigger Assert failures and perhaps other internal errors, since the code is not expecting these variables to change in such contexts. The previous patch did not cover this case because assign hooks don't really have enough information, so move the responsibility for preventing this into guc.c. Problem discovered by Heikki Linnakangas. Security: no CVE assigned yet, extends CVE-2007-6600
2009-05-19Update relpages and reltuples estimates in stand-alone ANALYZE, even ifHeikki Linnakangas
there's no analyzable attributes or indexes. We also used to report 0 live and dead tuples for such tables, which messed with autovacuum threshold calculations. This fixes bug #4812 reported by George Su. Backpatch back to 8.1.
2009-02-27In CREATE CONVERSION, test that the given function is a valid conversionHeikki Linnakangas
function for the specified source and destination encodings. We do that by calling the function with an empty string. If it can't perform the requested conversion, it will throw an error. Backport to 7.4 - 8.3. Per bug report #4680 by Denis Afonin.
2009-02-24Repair a longstanding bug in CLUSTER and the rewriting variants of ALTERTom Lane
TABLE: if the command is executed by someone other than the table owner (eg, a superuser) and the table has a toast table, the toast table's pg_type row ends up with the wrong typowner, ie, the command issuer not the table owner. This is quite harmless for most purposes, since no interesting permissions checks consult the pg_type row. However, it could lead to unexpected failures if one later tries to drop the role that issued the command (in 8.1 or 8.2), or strange warnings from pg_dump afterwards (in 8.3 and up, which will allow the DROP ROLE because we don't create a "redundant" owner dependency for table rowtypes). Problem identified by Cott Lang. Back-patch to 8.1. The problem is actually far older --- the CLUSTER variant can be demonstrated in 7.0 --- but it's mostly cosmetic before 8.1 because we didn't track ownership dependencies before 8.1. Also, fixing it before 8.1 would require changing the call signature of heap_create_with_catalog(), which seems to carry a nontrivial risk of breaking add-on modules.
2009-01-06Fix logic in lazy vacuum to decide if it's worth trying to truncate the heap.Heikki Linnakangas
If the table was smaller than REL_TRUNCATE_FRACTION (= 16) pages, we always tried to acquire AccessExclusiveLock on it even if there was no empty pages at the end. Report by Simon Riggs. Back-patch all the way to 7.4.
2008-12-13Fix failure to ensure that a snapshot is available to datatype input functionsTom Lane
when they are invoked by the parser. We had been setting up a snapshot at plan time but really it needs to be done earlier, before parse analysis. Per report from Dmitry Koterov. Also fix two related problems discovered while poking at this one: exec_bind_message called datatype input functions without establishing a snapshot, and SET CONSTRAINTS IMMEDIATE could call trigger functions without establishing a snapshot. Backpatch to 8.2. The underlying problem goes much further back, but it is masked in 8.1 and before because we didn't attempt to invoke domain check constraints within datatype input. It would only be exposed if a C-language datatype input function used the snapshot; which evidently none do, or we'd have heard complaints sooner. Since this code has changed a lot over time, a back-patch is hardly risk-free, and so I'm disinclined to patch further than absolutely necessary.
2008-12-01Ensure that the contents of a holdable cursor don't depend on out-of-lineTom Lane
toasted values, since those could get dropped once the cursor's transaction is over. Per bug #4553 from Andrew Gierth. Back-patch as far as 8.1. The bug actually exists back to 7.4 when holdable cursors were introduced, but this patch won't work before 8.1 without significant adjustments. Given the lack of field complaints, it doesn't seem worth the work (and risk of introducing new bugs) to try to make a patch for the older branches.
2008-10-25Fix an old bug in after-trigger handling: AfterTriggerEndQuery took theTom Lane
address of afterTriggers->query_stack[afterTriggers->query_depth] and hung onto it through all its firings of triggers. However, if a trigger causes sufficiently many nested query executions, query_stack will get repalloc'd bigger, leaving AfterTriggerEndQuery --- and hence afterTriggerInvokeEvents --- using a stale pointer. So far as I can find, the only consequence of this error is to stomp on a couple of words of already-freed memory; which would lead to a failure only if that chunk had already gotten re-allocated for something else. So it's hard to exhibit a simple failure case, but this is surely a bug. I noticed this while working on my recent patch to reduce pending-trigger space usage. The present patch is mighty ugly, because it requires making afterTriggerInvokeEvents know about all the possible event lists it might get called on. Fortunately, this is only needed in back branches because CVS HEAD avoids the problem in a different way: afterTriggerInvokeEvents only touches the passed AfterTriggerEventList pointer once at startup. Back branches are stable enough that wiring in knowledge of all possible call usages doesn't seem like a killer problem. Back-patch to 8.0. 7.4's trigger code is completely different and doesn't seem to have the problem (it doesn't even use repalloc).
2008-10-07When a relation is moved to another tablespace, we can't assume that we canHeikki Linnakangas
use the old relfilenode in the new tablespace. There might be another relation in the new tablespace with the same relfilenode, so we must generate a fresh relfilenode in the new tablespace. The 8.3 patch to let deleted relation files linger as zero-length files until the next checkpoint made this more obvious: moving a relation from one table space another, and then back again, caused a collision with the lingering file. Back-patch to 8.1. The issue is present in 8.0 as well, but it doesn't seem worth fixing there, because we didn't have protection from OID collisions after OID wraparound before 8.1. Report by Guillaume Lelarge.
2008-09-11Initialize the minimum frozen Xid in vac_update_datfrozenxid usingAlvaro Herrera
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not depend on the latter being correctly set elsewhere, and while it is more expensive, this code path is not performance-critical. This is a real risk for autovacuum, because it can execute whole cycles without doing a single vacuum, which would mean that RecentGlobalXmin would stay at its initialization value, FirstNormalTransactionId, causing a bogus value to be inserted in pg_database. This bug could explain some recent reports of failure to truncate pg_clog. At the same time, change the initialization of RecentGlobalXmin to InvalidTransactionId, and ensure that it's set to something else whenever it's going to be used. Using it as FirstNormalTransactionId in HOT page pruning could incur in data loss. InitPostgres takes care of setting it to a valid value, but the extra checks are there to prevent "special" backends from behaving in unusual ways. Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
2008-06-08ALTER AGGREGATE OWNER seems to have been missed by the last couple ofTom Lane
patches that dealt with object ownership. It wasn't updating pg_shdepend nor adjusting the aggregate's ACL. In 8.2 and up, fix this permanently by making it use AlterFunctionOwner_oid. In 8.1, the function code wasn't factored that way, so just copy and paste.
2008-05-27Back-patch the 8.3 fix that prohibits TRUNCATE, CLUSTER, and REINDEX when theTom Lane
current transaction has any open references to the target relation or index (implying it has an active query using the relation). Also back-patch the 8.2 fix that prohibits TRUNCATE and CLUSTER when there are pending AFTER-trigger events. Per suggestion from Heikki.
2008-05-09Fix an ancient oversight in change_varattnos_of_a_node: it neglected to updateTom Lane
varoattno along with varattno. This resulted in having Vars that were not seen as equal(), causing inheritance of the "same" constraint from different parent relations to fail. An example is create table pp1 (f1 int check (f1>0)); create table cc1 (f2 text, f3 int) inherits (pp1); create table cc2(f4 float) inherits(pp1,cc1); Backpatch as far as 7.4. (The test case still fails in 7.4, for reasons that I don't feel like investigating at the moment.) This is a backpatch commit only. The fix will be applied in HEAD as part of the upcoming pg_constraint patch.
2008-04-24Fix ALTER TABLE ADD COLUMN ... PRIMARY KEY so that the new column is correctlyTom Lane
checked to see if it's been initialized to all non-nulls. The implicit NOT NULL constraint was not being checked during the ALTER (in fact, not even if there was an explicit NOT NULL too), because ATExecAddColumn neglected to set the flag needed to make the test happen. This has been broken since the capability was first added, in 8.0. Brendan Jurd, per a report from Kaloyan Iliev.
2008-04-17Repair two places where SIGTERM exit could leave shared memory stateTom Lane
corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
2008-03-12Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponingTom Lane
pg_listener modifications commanded by LISTEN and UNLISTEN until the end of the current transaction. This allows us to hold the ExclusiveLock on pg_listener until after commit, with no greater risk of deadlock than there was before. Aside from fixing the race condition, this gets rid of a truly ugly kludge that was there before, namely having to ignore HeapTupleBeingUpdated failures during NOTIFY. There is a small potential incompatibility, which is that if a transaction issues LISTEN or UNLISTEN and then looks into pg_listener before committing, it won't see any resulting row insertion or deletion, where before it would have. It seems unlikely that anyone would be depending on that, though. This patch also disallows LISTEN and UNLISTEN inside a prepared transaction. That case had some pretty undesirable properties already, such as possibly allowing pg_listener entries to be made for PIDs no longer present, so disallowing it seems like a better idea than trying to maintain the behavior.
2008-02-11Repair VACUUM FULL bug introduced by HOT patch: the original way ofTom Lane
calculating a page's initial free space was fine, and should not have been "improved" by letting PageGetHeapFreeSpace do it. VACUUM FULL is going to reclaim LP_DEAD line pointers later, so there is no need for a guard against the page being too full of line pointers, and having one risks rejecting pages that are perfectly good move destinations. This also exposed a second bug, which is that the empty_end_pages logic assumed that any page with no live tuples would get entered into the fraged_pages list automatically (by virtue of having more free space than the threshold in the do_frag calculation). This assumption certainly seems risky when a low fillfactor has been chosen, and even without tunable fillfactor I think it could conceivably fail on a page with many unused line pointers. So fix the code to force do_frag true when notup is true, and patch this part of the fix all the way back. Per report from Tomas Szepe.
2008-02-07Some variants of ALTER OWNER tried to make the "object" field of theTom Lane
statement be a list of bare C strings, rather than String nodes, which is what they need to be for copyfuncs/equalfuncs to work. Fortunately these node types never go out to disk (if they did, we'd likely have noticed the problem sooner), so we can just fix it without creating a need for initdb. This bug has been there since 8.0, but 8.3 exposes it in a more common code path (Parse messages) than prior releases did. Per bug #3940 from Vladimir Kokovic.
2008-01-03Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,Tom Lane
and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
2007-10-25Ugly patch to make ALTER SEQUENCE OWNED BY not affect the currval() stateTom Lane
of the sequence. Since OWNED BY never existed before 8.2, this seems unlikely to create any compatibility issues. Other forms of ALTER SEQUENCE continue to do what they did before, namely update currval to match the sequence's actual last_val. That seems wrong on consideration, but we'll not change it in a minor release --- 8.3 will make that fix.
2007-09-29Disallow CLUSTER using an invalid index (that is, one left over from a failedTom Lane
CREATE INDEX CONCURRENTLY). Such an index might not have entries for every heap row and thus clustering with it would result in silent data loss. The scenario requires a pretty foolish DBA, but still ...
2007-09-24Reduce the size of memory allocations by lazy vacuum when processing a smallAlvaro Herrera
table, by allocating just enough for a hardcoded number of dead tuples per page. The current estimate is 200 dead tuples per page. Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent discussion. CVS: ---------------------------------------------------------------------- CVS: Enter Log. Lines beginning with `CVS:' are removed automatically CVS: CVS: Committing in . CVS: CVS: Modified Files: CVS: commands/vacuumlazy.c CVS: ----------------------------------------------------------------------
2007-09-16Fix aboriginal mistake in lazy VACUUM's code for truncating awayTom Lane
no-longer-needed pages at the end of a table. We thought we could throw away pages containing HEAPTUPLE_DEAD tuples; but this is not so, because such tuples very likely have index entries pointing at them, and we wouldn't have removed the index entries. The problem only emerges in a somewhat unlikely race condition: the dead tuples have to have been inserted by a transaction that later aborted, and this has to have happened between VACUUM's initial scan of the page and then rechecking it for empty in count_nondeletable_pages. But that timespan will include an index-cleaning pass, so it's not all that hard to hit. This seems to explain a couple of previously unsolved bug reports.
2007-09-12Add a CHECK_FOR_INTERRUPTS call in the site where the vacuum delay pointAlvaro Herrera
was removed.
2007-09-10Make CLUSTER and REINDEX silently skip remote temp tables in theirAlvaro Herrera
database-wide editions. Per report from bitsandbytes88 <at> hotmail.com and subsequent discussion.
2007-09-10Remove the vacuum_delay_point call in count_nondeletable_pages, because we holdAlvaro Herrera
an exclusive lock on the table at this point, which we want to release as soon as possible. This is called in the phase of lazy vacuum where we truncate the empty pages at the end of the table. An alternative solution would be to lower the vacuum delay settings before starting the truncating phase, but this doesn't work very well in autovacuum due to the autobalancing code (which can cause other processes to change our cost delay settings). This case could be considered in the balancing code, but it is simpler this way.
2007-08-25Fix brain fade in DefineIndex(): it was continuing to access the table'sTom Lane
relcache entry after having heap_close'd it. This could lead to misbehavior if a relcache flush wiped out the cache entry meanwhile. In 8.2 there is a very real risk of CREATE INDEX CONCURRENTLY using the wrong relid for locking and waiting purposes. I think the bug is only cosmetic in 8.0 and 8.1, because their transgression is limited to using RelationGetRelationName(rel) in an ereport message immediately after heap_close, and there's no way (except with special debugging options) for a cache flush to occur in that interval. Not quite sure that it's cosmetic in 7.4, but seems best to patch anyway. Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled. Maybe we should try to do that on a regular basis --- it's awfully slow, but perhaps some fast buildfarm machine could do it once in awhile.
2007-08-15Repair problems occurring when multiple RI updates have to be done to the sameTom Lane
row within one query: we were firing check triggers before all the updates were done, leading to bogus failures. Fix by making the triggers queued by an RI update go at the end of the outer query's trigger event list, thereby effectively making the processing "breadth-first". This was indeed how it worked pre-8.0, so the bug does not occur in the 7.x branches. Per report from Pavel Stehule.
2007-07-17Fix incorrect optimization of foreign-key checks. When an UPDATE on theTom Lane
referencing table does not change the tuple's FK column(s), we don't bother to check the PK table since the constraint was presumably already valid. However, the check is still necessary if the tuple was inserted by our own transaction, since in that case the INSERT trigger will conclude it need not make the check (since its version of the tuple has been deleted). We got this right for simple cases, but not when the insert and update are in different subtransactions of the current top-level transaction; in such cases the FK check would never be made at all. (Hence, problem dates back to 8.0 when subtransactions were added --- it's actually the subtransaction version of a bug fixed in 7.3.5.) Fix, and add regression test cases. Report and fix by Affan Salman.
2007-07-01Avoid memory leakage when a series of subtransactions invoke AFTER triggersTom Lane
that are fired at end-of-statement (as is the normal case for foreign keys, for example). In this situation the per-subxact deferred trigger context is always empty when subtransaction exit is reached; so we could free it, but were not doing so, leading to an intratransaction leak of 8K or more per subtransaction. Per off-list example from Viatcheslav Kalinin subsequent to bug #3418 (his original bug report omitted a foreign key constraint needed to cause this leak). Back-patch to 8.2; prior versions were not using per-subxact contexts for deferred triggers, so did not have this leak.
2007-06-20CREATE DOMAIN ... DEFAULT NULL failed because gram.y special-cases DEFAULTTom Lane
NULL and DefineDomain didn't. Bug goes all the way back to original coding of domains. Per bug #3396 from Sergey Burladyan.
2007-06-14Avoid having autovacuum run multiple ANALYZE commands in a single transaction,Alvaro Herrera
to prevent possible deadlock problems. Per request from Tom Lane.
2007-05-11Fix my oversight in enabling domains-of-domains: ALTER DOMAIN ADD CONSTRAINTTom Lane
needs to check the new constraint against columns of derived domains too. Also, make it error out if the domain to be modified is used within any composite-type columns. Eventually we should support that case, but it seems a bit painful, and not suitable for a back-patch. For the moment just let the user know we can't do it. Backpatch to 8.2, which is the only released version that allows nested domains. Possibly the other part should be back-patched further.
2007-04-26Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scanTom Lane
is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
2007-04-12Cancel pending fsync requests during WAL replay of DROP DATABASE, per bugTom Lane
report from David Darville. Back-patch as far as 8.1, which may or may not have the problem but it seems a safe change anyway.
2007-03-14Fix a longstanding bug in VACUUM FULL's handling of update chains. The codeTom Lane
did not expect that a DEAD tuple could follow a RECENTLY_DEAD tuple in an update chain, but because the OldestXmin rule for determining deadness is a simplification of reality, it is possible for this situation to occur (implying that the RECENTLY_DEAD tuple is in fact dead to all observers, but this patch does not attempt to exploit that). The code would follow a chain forward all the way, but then stop before a DEAD tuple when backing up, meaning that not all of the chain got moved. This could lead to copying the chain multiple times (resulting in duplicate copies of the live tuple at its end), or leaving dangling index entries behind (which, aside from generating warnings from later vacuums, creates a risk of wrong query results or bogus duplicate-key errors once the heap slot the index entry points to is repopulated). The fix is to recheck HeapTupleSatisfiesVacuum while following a chain forward, and to stop if a DEAD tuple is reached. Each contiguous group of RECENTLY_DEAD tuples will therefore be copied as a separate chain. The patch also adds a couple of extra sanity checks to verify correct behavior. Per report and test case from Pavan Deolasee.
2007-03-08Fix vac_update_relstats to ensure it always sends a relcache inval message,Tom Lane
even if none of the fields in the pg_class row change. This behavior is necessary to ensure other backends flush rd_targblock values that might point to truncated-away pages. We got this right pre-8.2 but it was broken by overoptimistic change to not write out the pg_class row if unchanged. Per report from Pavan Deolasee.
2007-02-06Fix an error in the original coding of holdable cursors: PersistHoldablePortalTom Lane
thought that it didn't have to reposition the underlying tuplestore if the portal is atEnd. But this is not so, because tuplestores have separate read and write cursors ... and the read cursor hasn't moved from the start. This mistake explains bug #2970 from William Zhang. Note: the coding here is pretty inefficient, but given that no one has noticed this bug until now, I'd say hardly anyone uses the case where the cursor has been advanced before being persisted. So maybe it's not worth worrying about.
2007-02-02Repair failure to check that a table is still compatible with a previouslyTom Lane
made query plan. Use of ALTER COLUMN TYPE creates a hazard for cached query plans: they could contain Vars that claim a column has a different type than it now has. Fix this by checking during plan startup that Vars at relation scan level match the current relation tuple descriptor. Since at that point we already have at least AccessShareLock, we can be sure the column type will not change underneath us later in the query. However, since a backend's locks do not conflict against itself, there is still a hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE while a query is in progress in the current backend. Seal that hole by rejecting ALTER TABLE whenever the target relation is already open in the current backend. This is a significant security hole: not only can one trivially crash the backend, but with appropriate misuse of pass-by-reference datatypes it is possible to read out arbitrary locations in the server process's memory, which could allow retrieving database content the user should not be able to see. Our thanks to Jeff Trout for the initial report. Security: CVE-2007-0556