Age | Commit message (Collapse) | Author |
|
|
|
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux. open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.
Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option. Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.
In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.
Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
|
|
temporary indexes are not WAL-logged. We used a constant LSN for temporary
indexes, on the assumption that we don't need to worry about concurrent page
splits in temporary indexes because they're only visible to the current
session. But that assumption is wrong, it's possible to insert rows and
split pages in the same session, while a scan is in progress. For example,
by opening a cursor and fetching some rows, and INSERTing new rows before
fetching some more.
Fix by generating fake increasing LSNs, used in place of real LSNs in
temporary GiST indexes.
|
|
|
|
Also do some further work in the back branches, where quite a bit wasn't
covered by Magnus' original back-patch.
|
|
|
|
SI invalidation events, rather than indirectly through the relcache.
In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry. This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.
The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective. The cost is that we have to
search the typcache linearly to find entries that need to be flushed. There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.
Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.
|
|
correctly on 64-bit Intel Solaris. Per my proposal yesterday,
8.2 is where we will start considering this platform supported.
While this patch itself could easily go into older branches,
there's not a huge amount of point unless we also make some
significantly-more-invasive changes in the spinlock support.
|
|
look through join alias Vars to avoid breaking join queries, and
move the test to someplace where it will catch more possible ways
of calling a function. We still ought to throw away the whole thing
in favor of a data-type-based solution, but that's not feasible in
the back branches.
Completion of back-port of my patch of yesterday.
|
|
a pass-by-reference datatype with a nontrivial projection step.
We were using the same memory context for the projection operation as for
the temporary context used by the hashtable routines in execGrouping.c.
However, the hashtable routines feel free to reset their temp context at
any time, which'd lead to destroying input data that was still needed.
Report and diagnosis by Tao Ma.
Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere. The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.
|
|
Windows, so that memory allocated by starting third party DLLs doesn't end
up conflicting.
The same functionality has been in 8.3 and 8.4 for almost a year, and seems
to have solved some of the more common shared memory errors on Windows.
|
|
being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane
pointed out. The bug affects FOR statement variants too, because you can
close an implicitly created cursor too by guessing the "<unnamed portal X>"
name created for it.
To fix that, "pin" the portal to prevent it from being dropped while it's
being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is
the oldest supported version.
|
|
|
|
Windows, thanks to a feature in CRT called Parameter Validation.
Backpatch to 8.2, which is the oldest version supported on Windows. In
8.2 and 8.3 also backpatch the earlier change to use DEVNULL instead of
NULL_DEV #define for a /dev/null-like device. NULL_DEV was hard-coded to
"/dev/null" regardless of platform, which didn't work on Windows, while
DEVNULL works on all platforms. Restarting syslogger didn't work on
Windows on versions 8.3 and below because of that.
|
|
by a superuser -- "ALTER USER f RESET setting" already disallows removing such a
setting.
Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database
owner that's not superuser.
|
|
|
|
|
|
an allegedly immutable index function. It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user. However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.
The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue. GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation. Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement. (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)
Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.
Security: CVE-2009-4136
|
|
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).
At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
|
|
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.
The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.
Fix back to 8.1, where both 2PC and multixact were introduced.
|
|
compatibility with pre-Windows 2000 versions.
|
|
for the pg_regress part which did not support admin execution in 8.2.
|
|
In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.
In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.
The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed. It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.
|
|
|
|
functions.
This extends the previous patch that forbade SETting these variables inside
security-definer functions. RESET is equally a security hole, since it
would allow regaining privileges of the caller; furthermore it can trigger
Assert failures and perhaps other internal errors, since the code is not
expecting these variables to change in such contexts. The previous patch
did not cover this case because assign hooks don't really have enough
information, so move the responsibility for preventing this into guc.c.
Problem discovered by Heikki Linnakangas.
Security: no CVE assigned yet, extends CVE-2007-6600
|
|
The original coding was not dealing specially with this file being a symlink,
with the end result that it was not installed in VPATH builds. Oddly enough,
the clean target does know about it ...
|
|
should use GinItemPointerGetBlockNumber/GinItemPointerGetOffsetNumber,
not ItemPointerGetBlockNumber/ItemPointerGetOffsetNumber, because the latter
will Assert() on ip_posid == 0, ie a "Min" pointer. (Thus, ItemPointerIsMin
has never worked at all, but it seems unused at present.) I'm not certain
that the case can occur in normal functioning, but it's blowing up on me
while investigating Tatsuo-san's data corruption problem. In any case it
seems like a problem waiting to bite someone.
Back-patch just in case this really is a problem for somebody in the field.
|
|
TupleTableSlots. We have functions for retrieving a minimal tuple from a slot
after storing a regular tuple in it, or vice versa; but these were implemented
by converting the internal storage from one format to the other. The problem
with that is it invalidates any pass-by-reference Datums that were already
fetched from the slot, since they'll be pointing into the just-freed version
of the tuple. The known problem cases involve fetching both a whole-row
variable and a pass-by-reference value from a slot that is fed from a
tuplestore or tuplesort object. The added regression tests illustrate some
simple cases, but there may be other failure scenarios traceable to the same
bug. Note that the added tests probably only fail on unpatched code if it's
built with --enable-cassert; otherwise the bug leads to fetching from freed
memory, which will not have been overwritten without additional conditions.
Fix by allowing a slot to contain both formats simultaneously; which turns out
not to complicate the logic much at all, if anything it seems less contorted
than before.
Back-patch to 8.2, where minimal tuples were introduced.
|
|
them from degrading badly when the input is sorted or nearly so. In this
scenario the tree is unbalanced to the point of becoming a mere linked list,
so insertions become O(N^2). The easiest and most safely back-patchable
solution is to stop growing the tree sooner, ie limit the growth of N. We
might later consider a rebalancing tree algorithm, but it's not clear that
the benefit would be worth the cost and complexity. Per report from Sergey
Burladyan and an earlier complaint from Heikki.
Back-patch to 8.2; older versions didn't have GIN indexes.
|
|
|
|
encoding conversion of any elog/ereport message being sent to the frontend.
This generalizes a patch that I put in last October, which suppressed
translation of only specific messages known to be associated with recursive
can't-translate-the-message behavior. As shown in bug #4680, we need a more
general answer in order to have some hope of coping with broken encoding
conversion setups. This approach seems a good deal less klugy anyway.
Patch in all supported branches.
|
|
TABLE: if the command is executed by someone other than the table owner (eg,
a superuser) and the table has a toast table, the toast table's pg_type row
ends up with the wrong typowner, ie, the command issuer not the table owner.
This is quite harmless for most purposes, since no interesting permissions
checks consult the pg_type row. However, it could lead to unexpected failures
if one later tries to drop the role that issued the command (in 8.1 or 8.2),
or strange warnings from pg_dump afterwards (in 8.3 and up, which will allow
the DROP ROLE because we don't create a "redundant" owner dependency for table
rowtypes). Problem identified by Cott Lang.
Back-patch to 8.1. The problem is actually far older --- the CLUSTER variant
can be demonstrated in 7.0 --- but it's mostly cosmetic before 8.1 because we
didn't track ownership dependencies before 8.1. Also, fixing it before 8.1
would require changing the call signature of heap_create_with_catalog(), which
seems to carry a nontrivial risk of breaking add-on modules.
|
|
|
|
encoding conversion functions. These are not can't-happen cases because
it's possible to create a conversion with the wrong conversion function
for the specified encoding pair. That would lead to an Assert crash in
an Assert-enabled build, or incorrect conversion otherwise, neither of
which is desirable. This would be a DOS issue if production databases
were customarily built with asserts enabled, but fortunately that's not so.
Per an observation by Heikki.
Back-patch to all supported branches.
|
|
OutputFunctionCall, and friends. This allows SPI-using functions to invoke
datatype I/O without concern for the possibility that a SPI-using function
will be called (which could be either the I/O function itself, or a function
used in a domain check constraint). It's a tad ugly, but not nearly as ugly
as what'd be needed to make this work via retail insertion of push/pop
operations in all the PLs.
This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop
calls into plpgsql; that approach only fixed plpgsql, and not any other PLs.
But the other PLs have the issue too, as illustrated by a recent gripe from
Christian Schröder.
Back-patch to 8.2, which is as far back as this solution will work. It's
also as far back as we need to worry about the domain-constraint case, since
earlier versions did not attempt to check domain constraints within datatype
input. I'm not aware of any old I/O functions that use SPI themselves, so
this should be sufficient for a back-patch.
|
|
when they are invoked by the parser. We had been setting up a snapshot at
plan time but really it needs to be done earlier, before parse analysis.
Per report from Dmitry Koterov.
Also fix two related problems discovered while poking at this one:
exec_bind_message called datatype input functions without establishing a
snapshot, and SET CONSTRAINTS IMMEDIATE could call trigger functions without
establishing a snapshot.
Backpatch to 8.2. The underlying problem goes much further back, but it is
masked in 8.1 and before because we didn't attempt to invoke domain check
constraints within datatype input. It would only be exposed if a C-language
datatype input function used the snapshot; which evidently none do, or we'd
have heard complaints sooner. Since this code has changed a lot over time,
a back-patch is hardly risk-free, and so I'm disinclined to patch further
than absolutely necessary.
|
|
outer join clauses. Given, say,
... from a left join b on a.a1 = b.b1 where a.a1 = 42;
we'll deduce a clause b.b1 = 42 and then mark the original join clause
redundant (we can't remove it completely for reasons I don't feel like
squeezing into this log entry). However the original implementation of
that wasn't bulletproof, because clause_selectivity() wouldn't honor
this_selec if given nonzero varRelid --- which in practice meant that
it worked as desired *except* when considering index scan quals. Which
resulted in bogus underestimation of the size of the indexscan result for
an inner indexscan in an outer join, and consequently a possibly bad
choice of indexscan vs. bitmap scan. Fix by introducing an explicit test
into clause_selectivity(). Also, to make sure we don't trigger that test
in corner cases, change the convention to be that this_selec > 1, not
this_selec = 1, means it's been marked redundant. Per trouble report from
Scara Maccai.
Back-patch to 8.2, where the problem was introduced.
|
|
toasted values, since those could get dropped once the cursor's transaction
is over. Per bug #4553 from Andrew Gierth.
Back-patch as far as 8.1. The bug actually exists back to 7.4 when holdable
cursors were introduced, but this patch won't work before 8.1 without
significant adjustments. Given the lack of field complaints, it doesn't seem
worth the work (and risk of introducing new bugs) to try to make a patch for
the older branches.
|
|
we extended the appendrel mechanism to support UNION ALL optimization. The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.
In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
|
|
|
|
recursion when we are unable to convert a localized error message to the
client's encoding. We've been over this ground before, but as reported by
Ibrar Ahmed, it still didn't work in the case of conversion failures for
the conversion-failure message itself :-(. Fix by installing a "circuit
breaker" that disables attempts to localize this message once we get into
recursion trouble.
Patch all supported branches, because it is in fact broken in all of them;
though I had to add some missing translations to the older branches in
order to expose the failure in the particular test case I was using.
|
|
correctly set. As result, killtuple() marks as dead
wrong tuple on page. Bug was introduced by me while fixing
possible duplicates during GiST index scan.
|
|
|
|
forestalls potential overflow when the same table (or other object, but
usually tables) is accessed by very many successive queries within a single
transaction. Per report from Michael Milligan.
Back-patch to 8.0, which is as far back as the patch conveniently applies.
There have been no reports of overflow in pre-8.3 releases, but clearly the
risk existed all along. (Michael's report suggests that 8.3 may consume lock
counts faster than prior releases, but with no test case to look at it's hard
to be sure about that. Widening the counts seems a good future-proofing
measure in any event.)
|
|
at once and ItemPointers are collected in memory.
Remove tuple's killing by killtuple() if tuple was moved to another
page - it could produce unaceptable overhead.
Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
|
|
parent, not only those with RangeTblRefs. We need them in ExecCheckRTPerms.
Report by Brendan O'Shea. Back-patch to 8.2, where pull_up_simple_union_all
was introduced.
|
|
|
|
|
|
current transaction has any open references to the target relation or index
(implying it has an active query using the relation). Also back-patch the
8.2 fix that prohibits TRUNCATE and CLUSTER when there are pending
AFTER-trigger events. Per suggestion from Heikki.
|
|
<craig@postnewspapers.com.au>.
It was my mistake, I missed limitation of number of held locks, now GIN doesn't
use continiuous locks, but still hold buffers pinned to prevent interference
with vacuum's deletion algorithm.
|