Age | Commit message (Collapse) | Author |
|
That's what I get for not fully retesting the final version of the patch.
The replace_allowed cross-check needs an additional special case for
bootstrapping.
|
|
RelationCacheInsert() ignored the possibility that hash_search(HASH_ENTER)
might find a hashtable entry already present for the same OID. However,
that can in fact occur during recursive relcache load scenarios. When it
did happen, we overwrote the pointer to the pre-existing Relation, causing
a session-lifespan leakage of that entire structure. As far as is known,
the pre-existing Relation would always have reference count zero by the
time we arrive back at the outer insertion, so add code that deletes the
pre-existing Relation if so. If by some chance its refcount is positive,
elog a WARNING and allow the pre-existing Relation to be leaked as before.
Also, AttrDefaultFetch() was sloppy about leaking the cstring form of the
pg_attrdef.adbin value it's copying into the relcache structure. This is
only a query-lifespan leakage, and normally not very significant, but it
adds up during CLOBBER_CACHE testing.
These bugs are of very ancient vintage, but I'll refrain from back-patching
since there's no evidence that these leaks amount to anything in ordinary
usage.
|
|
The fallback implementation involves acquiring and releasing a spinlock
variable that is otherwise unreferenced --- not even to the extent of
initializing it. This accidentally fails to fail on platforms where
spinlocks should be initialized to zeroes, but elsewhere it results in
a "stuck spinlock" failure during startup.
I griped about this last July, and put in a hack that worked for gcc
on HPPA, but didn't get around to fixing the general case. Per the
discussion back then, the best thing to do seems to be to initialize
dummy_spinlock in main.c.
|
|
Per testing with a compiler that whines about this.
|
|
The xl_heap_header_len structures in an XLOG_HEAP_UPDATE record aren't
necessarily aligned adequately. The regular replay function for these
records is aware of that, but decode.c didn't get the memo. I'm not
sure why the buildfarm failed to catch this; the test_decoding test
certainly blows up real good on my old HPPA box.
Also, I'm pretty sure that the address arithmetic was wrong for the
case of XLOG_HEAP_CONTAINS_OLD and not XLOG_HEAP_CONTAINS_NEW_TUPLE,
though this apparently can't happen when logical decoding is active.
|
|
transam/README explained how B-tree incomplete splits were tracked and
fixed after recovery, as an example of handling complex actions that need
multiple WAL records, but that's not how it works anymore. Explain the new
paradigm.
|
|
Several years ago we changed chr(int) so that if the database encoding is
UTF8, it would interpret its argument as a Unicode code point and expand it
into the appropriate multibyte sequence. However, we weren't sufficiently
careful about checking validity of the input. According to RFC3629, UTF8
disallows code points above U+10FFFF (note that the predecessor standard
RFC2279 was more liberal). Also, both versions of the UTF8 spec agree
that Unicode surrogate-pair codes should never appear in UTF8. Because
our encoding validity checks follow RFC3629, our failure to enforce these
restrictions in chr() means it could be used to produce text strings that
will be rejected when the database is dumped and reloaded. To ensure
consistency with the input functions, let's actually apply
pg_utf8_islegal() to the proposed output of chr().
Per discussion, this seems like too much of a behavioral change to
back-patch, but it's not too late to squeeze it into 9.4.
|
|
The decoding of prepared transaction commits accidentally used the XID of
the transaction performing the COMMIT PREPARED, not the XID of the prepared
transaction. Before bb38fb0d43c8d that lead to those transactions not being
decoded, afterwards to a assertion failure.
|
|
Let's complain about e.g an invalid path or permission problem sooner rather
than later. Before this patch, we would only try to open the output file
after receiving the first decoded message from the server.
|
|
Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed
that prepared transaction commits reuse that struct. Fix that.
Because those fields were left unitialized, replaying a commit prepared WAL
record in a hot standby node would fail to remove the relcache init file.
That can lead to "could not open file" errors on the standby. Relcache init
file only needs to be removed when a system table/index is rewritten in the
transaction using two phase commit, so that should be rare in practice. In
HEAD, the incorrect dbId/tsId values are also used for filtering in logical
replication code, causing the transaction to always be filtered out.
Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was
introduced.
|
|
In yesterday's commit 2dc4f011fd61501cce507be78c39a2677690d44b, I tried
to force buffering of stdout/stderr in initdb to be what it is by
default when the program is run interactively on Unix (since that's how
most manual testing is done). This tripped over the fact that Windows
doesn't support _IOLBF mode. We dealt with that a long time ago in
syslogger.c by falling back to unbuffered mode on Windows. Export that
solution in port.h and use it in initdb.
Back-patch to 8.4, like the previous commit.
|
|
Don't close stdout on SIGHUP. Also, when a SIGHUP is received, close the
file immediately, rather than only after receiving some more data from
the server. Rename a variable, to avoid mentally dealing with double
negatives (not unsynced means synced).
|
|
The proc array can contain duplicate XIDs, when a transaction is just being
prepared for two-phase commit. To cope, remove any duplicates in
txid_current_snapshot(). Also ignore duplicates in the input functions, so
that if e.g. you have an old pg_dump file that already contains duplicates,
it will be accepted.
Report and fix by Jan Wieck. Backpatch to all supported versions.
|
|
To lock a prepared transaction's shared memory entry, we used to mark it
with the XID of the backend. When the XID was no longer active according
to the proc array, the entry was implicitly considered as not locked
anymore. However, when preparing a transaction, the backend's proc array
entry was cleared before transfering the locks (and some other state) to
the prepared transaction's dummy PGPROC entry, so there was a window where
another backend could finish the transaction before it was in fact fully
prepared.
To fix, rewrite the locking mechanism of global transaction entries. Instead
of an XID, just have simple locked-or-not flag in each entry (we store the
locking backend's backend id rather than a simple boolean, but that's just
for debugging purposes). The backend is responsible for explicitly unlocking
the entry, and to make sure that that happens, install a callback to unlock
it on abort or process exit.
Backpatch to all supported versions.
|
|
Euler Taveira
|
|
Mingw-w64 headers map popen/pclose to _popen and _pclose, but we want to use
our popen wrapper rather than the Mingw-w64. #undef the Mingw's version.
|
|
|
|
Since this program may print to either stdout or stderr, the relative
ordering of its messages depends on the buffering behavior of those files.
Force stdout to be line-buffered and stderr to be unbuffered, ensuring
that the behavior will match standard Unix interactive behavior, even
when stdout and stderr are rerouted to a file.
Per complaint from Tomas Vondra. The particular case he pointed out is
new in HEAD, but issues of the same sort could arise in any branch with
other error messages, so back-patch to all branches.
I'm unsure whether we might not want to do this in other client programs
as well. For the moment, just fix initdb.
|
|
rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr
and rd_idattr should be managed like rd_indexattr. Omissions in this area
meant that the bitmapsets computed for rd_keyattr and rd_idattr would be
leaked during any relcache flush, resulting in a slow but permanent leak in
CacheMemoryContext. There was also a tiny probability of relcache entry
corruption if we ran out of memory at just the wrong point in
RelationGetIndexAttrBitmap. Otherwise, the fields were not zeroed where
expected, which would not bother the code any AFAICS but could greatly
confuse anyone examining the relcache entry while debugging.
Also, create an API function RelationGetReplicaIndex rather than letting
non-relcache code be intimate with the mechanisms underlying caching of
that value (we won't even mention the memory leak there).
Also, fix a relcache flush hazard identified by Andres Freund:
RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid
across index_open.
The aspects of this involving rd_keyattr date back to 9.3, so back-patch
those changes.
|
|
Historically we've printed a complaint for a bad locale setting, but then
fallen back to the environment default. Per discussion, this is not such
a great idea, because rectifying an erroneous locale choice post-initdb
(perhaps long after data has been loaded) could be enormously expensive.
Better to complain and give the user a chance to double-check things.
The behavior was particularly bad if the bad setting came from environment
variables rather than a bogus command-line switch: in that case not only
was there a fallback to C/SQL_ASCII, but the printed complaint was quite
unhelpful. It's hard to be entirely sure what variables setlocale looked
at, but we can at least give a hint where the problem might be.
Per a complaint from Tomas Vondra.
|
|
When cache invalidations arrive while ri_LoadConstraintInfo() is busy
filling a new cache entry, InvalidateConstraintCacheCallBack() compares
the - not yet initialized - oidHashValue field with the to-be-invalidated
hash value. To fix, check whether the entry is already marked as invalid.
Andres Freund
|
|
Andres Freund
|
|
America/Metlakatla hasn't been in the IANA database all that long, so
some installations might not have it. It does seem worthwhile to test
with a fractional-minute GMT offset, but we can get that from almost
any pre-1900 date; I chose Europe/Paris, whose LMT offset from Greenwich
should be pretty darn well established.
Also, assuming that Mars/Mons_Olympus will never be in the IANA database
seems less than future-proof, so let's use a more fanciful location for
the bad-zone-name check.
Per complaint from Christoph Berg.
|
|
config.pl and buildenv.pl can be used to customize build settings when
using MSVC. They should never get committed into the common source tree.
Back-patch to 9.0; it looks like the rules were different in 8.4.
Michael Paquier
|
|
The leak is fairly small and rare, but a leak nevertheless.
Per Coverity report. Backpatch to 9.2, where pg_receivexlog was added.
pg_basebackup shares the code, but it always exits on error, so there is
no real leak.
|
|
|
|
The original coding for ALTER SYSTEM made a fundamentally bogus assumption
that postgresql.auto.conf could be sought relative to the main config file
if we hadn't yet determined the value of data_directory. This fails for
common arrangements with the config file elsewhere, as reported by
Christoph Berg.
The simplest fix is to not try to read postgresql.auto.conf until after
SelectConfigFiles has chosen (and locked down) the data_directory setting.
Because of the logic in ProcessConfigFile for handling resetting of GUCs
that've been removed from the config file, we cannot easily read the main
and auto config files separately; so this patch adopts a brute force
approach of reading the main config file twice during postmaster startup.
That's a tad ugly, but the actual time cost is likely to be negligible,
and there's no time for a more invasive redesign before beta.
With this patch, any attempt to set data_directory via ALTER SYSTEM
will be silently ignored. It would probably be better to throw an
error, but that can be dealt with later. This bug, however, would
prevent any testing of ALTER SYSTEM by a significant fraction of the
userbase, so it seems important to get it fixed before beta.
|
|
There's no longer much pressure to switch the default GIN opclass for
jsonb, but there was still some unhappiness with the name "jsonb_hash_ops",
since hashing is no longer a distinguishing property of that opclass,
and anyway it seems like a relatively minor detail. At the suggestion of
Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the
important characteristic that each index entry depends on the entire path
from the document root to the indexed value.
Also add a user-facing explanation of the implementation properties of
these two opclasses.
|
|
|
|
Per discussion, this seems like a more consistent choice of name.
FabrÃzio de Royes Mello, after a suggestion by Peter Eisentraut;
some additional documentation wordsmithing by me
|
|
When returning rows from a bitmap, as done with partial match queries, we
would get stuck in an infinite loop if the bitmap contained a lossy page
reference.
This bug is new in master, it was introduced by the patch to allow skipping
items refuted by other entries in GIN scans.
Report and fix by Alexander Korotkov
|
|
reserveFromBuffer() failed to consider the possibility that it needs to
more-than-double the current buffer size. Beyond that, it seems likely
that we'd someday need to worry about integer overflow of the buffer
length variable. Rather than reinvent the logic that's already been
debugged in stringinfo.c, let's go back to using that logic. We can
still have the same targeted API, but we'll rely on stringinfo.c to
manage reallocation.
Per report from Alexander Korotkov.
|
|
These functions were relying on typcategory to identify arrays and
composites, which is not reliable and not the normal way to do it.
Using typcategory to identify boolean, numeric types, and json itself is
also pretty questionable, though the code in those cases didn't seem to be
at risk of anything worse than wrong output. Instead, use the standard
lsyscache functions to identify arrays and composites, and rely on a direct
check of the type OID for the other cases.
In HEAD, also be sure to look through domains so that a domain is treated
the same as its base type for conversions to JSON. However, this is a
small behavioral change; given the lack of field complaints, we won't
back-patch it.
In passing, refactor so that there's only one copy of the code that decides
which conversion strategy to apply, not multiple copies that could (and
have) gotten out of sync.
|
|
Post-commit review identified a number of places where addition was
used instead of multiplication or memory wasn't zeroed where it should
have been. This commit also fixes one case where a structure member
was mis-initialized, and moves another memory allocation closer to
the place where the allocated storage is used for clarity.
Andres Freund
|
|
It's legal to configure wal_level=logical and max_replication_slots=0
simultaneously.
Andres Freund
|
|
This code really needs to be refactored so that there aren't so many copies
that can diverge. Not to mention that this whole approach is probably
wrong. But for the moment I'll just stick my finger in the dike.
Per report from Michael Paquier.
|
|
Dunno who had the cute idea of labeling jsonb as typcategory 'C',
but it is not a composite type. Label it 'U', since that's what
json is using.
|
|
Fix JSONB_MAX_ELEMS and JSONB_MAX_PAIRS macros to use CB_MASK in the
calculation. JENTRY_POSMASK happens to have the same value at the moment,
but that's just coincidental.
Refactor jsonb iterator functions, for readability.
Get rid of the JENTRY_ISFIRST flag. Whenever we handle JEntrys, we have
access to the whole array and have enough context information to know
which entry is the first. This frees up one bit in the JEntry header for
future use. While we're at it, shuffle the JEntry bits so that boolean
true and false go together, for aesthetic reasons.
Bump catalog version as this changes the on-disk format slightly.
|
|
Change the key representation so that values that would exceed 127 bytes
are hashed into short strings, and so that the original JSON datatype of
each value is recorded in the index. The hashing rule eliminates the major
objection to having this opclass be the default for jsonb, namely that it
could fail for plausible input data (due to GIN's restrictions on maximum
key length). Preserving datatype information doesn't really buy us much
right now, but it requires no extra space compared to the previous way,
and it might be useful later.
Also, change the consistency-checking functions to request recheck for
exists (jsonb ? text) and related operators. The original analysis that
this is an exactly checkable query was incorrect, since the index does
not preserve information about whether a key appears at top level in
the indexed JSON object. Add a test case demonstrating the problem.
Make some other, mostly cosmetic improvements to the code in jsonb_gin.c
as well.
catversion bump due to on-disk data format change in jsonb_ops indexes.
|
|
Move the functions around to group related functions together. Remove
binequal argument from lengthCompareJsonbStringValue, moving that
responsibility to lengthCompareJsonbPair. Fix typo in comment.
|
|
This speeds up text to jsonb parsing and hstore to jsonb conversions
somewhat.
|
|
Ensure that ecpg preprocessor output files are rebuilt when re-testing
after a change in the ecpg preprocessor itself, or a change in any of
several include files that get copied verbatim into the output files.
The lack of these dependencies was what created problems for Kevin Grittner
after the recent pgindent run. There's no way for --enable-depend to
discover these dependencies automatically, so we've gotta put them into
the Makefiles by hand.
While at it, reduce the amount of duplication in the ecpg invocations.
|
|
Per discussion, the old value of 128MB is ridiculously small on modern
machines; in fact, it's not even any larger than the default value of
shared_buffers, which it certainly should be. Increase to 4GB, which
is unlikely to be any worse than the old default for anyone, and should
be noticeably better for most. Eventually we might have an autotuning
scheme for this setting, but the recent attempt crashed and burned,
so for now just do this.
|
|
This reverts commit ee1e5662d8d8330726eaef7d3110cb7add24d058, as well as
a remarkably large number of followup commits, which were mostly concerned
with the fact that the implementation didn't work terribly well. It still
doesn't: we probably need some rather basic work in the GUC infrastructure
if we want to fully support GUCs whose default varies depending on the
value of another GUC. Meanwhile, it also emerged that there wasn't really
consensus in favor of the definition the patch tried to implement (ie,
effective_cache_size should default to 4 times shared_buffers). So whack
it all back to where it was. In a followup commit, I'll do what was
recently agreed to, which is to simply change the default to a higher
value.
|
|
Commit 4318daecc959886d001a6e79c6ea853e8b1dfb4b broke it. The change in
sub-second precision at extreme dates is normal. The inconsistent
truncation vs. rounding is essentially a bug, albeit a longstanding one.
Back-patch to 8.4, like the causative commit.
|
|
Previous commit was confused about the case we're handling: actually,
what the patch is dealing with is platforms that have optreset, *and*
have <getopt.h>, but the latter fails to declare the former. Because
we use a linking probe to set HAVE_INT_OPTRESET, we need to be sure we
have a declaration even if <getopt.h> doesn't think it exists.
|
|
Reportedly, some versions of mingw are like that, and it seems plausible
in general that older platforms might be that way. However, we'd
determined experimentally that just doing "extern int" conflicts with
the way Cygwin declares these variables, so explicitly exclude Cygwin.
Michael Paquier, tweaked by me to hopefully not break Cygwin
|
|
To-be-deleted list pages contain no useful information, as they are being
deleted, but we must still protect the writes from being torn by a crash
after a partial write. To do that, re-initialize the pages on WAL replay.
Jeff Janes caught this with a test program to test partial writes.
Backpatch to all supported versions.
|
|
Michael Paquier
|
|
If the server sends a long stream of data, and the server + network are
consistently fast enough to force the recv() loop in pqReadData() to
iterate until libpq's input buffer is full, then upon processing the last
incomplete message in each bufferload we'd usually double the buffer size,
due to supposing that we didn't have enough room in the buffer to finish
collecting that message. After filling the newly-enlarged buffer, the
cycle repeats, eventually resulting in an out-of-memory situation (which
would be reported misleadingly as "lost synchronization with server").
Of course, we should not enlarge the buffer unless we still need room
after discarding already-processed messages.
This bug dates back quite a long time: pqParseInput3 has had the behavior
since perhaps 2003, getCopyDataMessage at least since commit 70066eb1a1ad
in 2008. Probably the reason it's not been isolated before is that in
common environments the recv() loop would always be faster than the server
(if on the same machine) or faster than the network (if not); or at least
it wouldn't be slower consistently enough to let the buffer ramp up to a
problematic size. The reported cases involve Windows, which perhaps has
different timing behavior than other platforms.
Per bug #7914 from Shin-ichi Morita, though this is different from his
proposed solution. Back-patch to all supported branches.
|