Age | Commit message (Collapse) | Author |
|
|
|
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
|
|
Compilers that don't realize that elog(ERROR) doesn't return
complained that SlabRealloc() failed to return a value.
While at it, fix the rather muddled header comment for the function.
Per buildfarm.
|
|
Compilers that don't realize that ereport(ERROR) doesn't return
complained that XmlTableGetValue() failed to return a value.
Also, make XmlTableFetchRow's non-USE_LIBXML case look more like
the other ones. As coded, it could lead to "unreachable code"
warnings with USE_LIBXML enabled.
Oversights in commit fcec6caaf. Per buildfarm.
|
|
Further fallout from commit c29aff959: there are some files that need
<float.h>, and were getting it from datatype/timestamp.h, but it was not
apparent in my (tgl's) testing because the requirement for <float.h>
exists only on certain Windows toolchains.
Report and patch by David Rowley.
Discussion: https://postgr.es/m/CAKJS1f-BHceaFzZScFapDV48gUVM2CAOBfhkgffdqXzFb+kwew@mail.gmail.com
|
|
Large chunks (those too large for any palloc freelist) are managed as
separate blocks. Formerly, realloc'ing or pfree'ing such a chunk required
O(N) time in a context with N blocks, since we had to traipse down the
singly-linked block list to locate the block's predecessor before we could
fix the list links. This can result in O(N^2) runtime in situations where
large numbers of such chunks are manipulated within one context. Cases
like that were not foreseen in the original design of aset.c, and indeed
didn't arise until fairly recently. But such problems can now occur in
reorderbuffer.c and in hash joining, both of which make repeated large
requests without scaling up their request size as they do so, and which
will free their requests in not-necessarily-LIFO order.
To fix, change the block list from singly-linked to doubly-linked.
This adds another 4 or 8 bytes to ALLOC_BLOCKHDRSZ, but that doesn't
seem like unacceptable overhead, since aset.c's blocks are normally
8K or more, and never less than 1K in current practice.
In passing, get rid of some redundant AllocChunkGetPointer() calls in
AllocSetRealloc (the compiler might be smart enough to optimize these
away anyway, but no need to assume that) and improve AllocSetCheck's
checking of block header fields.
Back-patch to 9.4 where reorderbuffer.c appeared. We could take this
further back, but currently there's no evidence that it would be useful.
Discussion: https://postgr.es/m/CAMkU=1x1hvue1XYrZoWk_omG0Ja5nBvTdvgrOeVkkeqs71CV8g@mail.gmail.com
|
|
libxml2 older than 2.9.1 does not have xmlXPathSetContextNode (released
in 2013, so reasonable platforms have trouble). That function is fairly
trivial, so I have inlined it in the one added caller. This passes
tests on my machine; let's see what the buildfarm thinks about it.
Per joint complaint from Tom Lane and buildfarm.
|
|
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
|
|
David Rowley, reviewed by Amit Kapila
Discussion: http://postgr.es/m/CAKJS1f8gPEUPscj6kSqpveMnnx9_3ZypzwsKstv+8atx6VmjBg@mail.gmail.com
|
|
Third time's the charm.
|
|
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
|
|
Commit 19dc233c32f2900e57b8da4f41c0f662ab42e080 introduced these
comments. Michael Paquier noticed that one of them had a typo, but
a bigger problem is that they were not an accurate description of
what the code was doing.
Patch by me.
|
|
The following parameters are now updateable with ShareUpdateExclusiveLock
effective_io_concurrency
parallel_workers
seq_page_cost
random_page_cost
n_distinct
n_distinct_inherited
Simon Riggs and Fabrízio Mello
|
|
These were introduced by me in f4e2d50c.
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/11adca69-be28-44bc-a801-64e6d53851e3@2ndquadrant.com
|
|
The syslogger will write out the current stderr and csvlog names, if
it's running and there are any, to a new file in the data directory
called "current_logfiles". We take care to remove this file when it
might no longer be valid (but not at shutdown). The function
pg_current_logfile() can be used to read the entries in the file.
Gilles Darold, reviewed and modified by Karl O. Pinc, Michael
Paquier, and me. Further review by Álvaro Herrera and Christoph Berg.
|
|
Currently, the whole row is shown without column names. Instead,
adopt a style similar to _bt_check_unique() in ExecFindPartition()
and show the failing key: (key1, ...) = (val1, ...).
Amit Langote, per a complaint from Simon Riggs. Reviewed by me;
I also adjusted the grammar in one of the comments.
Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp
|
|
Tomas Vondra
|
|
Likewise in RestoreSnapshot(). Do so by copying between the user buffer
and a stack buffer of known alignment. Back-patch to 9.6, where this
last applies cleanly. In master, the select_parallel test dies with
SIGBUS on "Oracle Solaris 10 1/13 s10s_u11wos_24a SPARC", building
32-bit with gcc 4.9.2. In 9.6 and 9.5, the buffers in question happen
to be sufficiently-aligned, and this change is mere insurance against
future 9.6 changes or extension code compromising that.
|
|
|
|
|
|
In the previous commit I'd made MemoryContextContains() use
GetMemoryChunkContext(), but that causes trouble when the passed
pointer isn't allocated in any memory context - that's probably
something we shouldn't do, but the previous commit isn't a place for a
"policy" change.
|
|
The README was written as a "historical account", and that style
hasn't aged particularly well. Rephrase it to describe the current
situation, instead of having various version specific comments.
This also updates the description of how allocated chunks are
associated with their corresponding context, the method of which has
changed in the preceding commit.
Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
|
|
The new slab allocator needs different per-allocation information than
the classical aset.c. The definition in 58b25e981 wasn't sufficiently
careful on 32 platforms with 8 byte alignment, leading to buildfarm
failures. That's not entirely easy to fix by just adjusting the
definition.
As slab.c doesn't actually need the size part(s) of the common header,
all chunks are equally sized after all, it seems better to instead
reduce the header to the part needed by all allocators, namely which
context an allocation belongs to. That has the advantage of reducing
the overhead of slab allocations, and also allows for more flexibility
in future allocators.
To avoid spreading the logic about accessing a chunk's context around,
centralize it in GetMemoryChunkContext(), which allows to delete a
good number of lines.
A followup commit will revise the mmgr/README portion about
StandardChunkHeader, and more.
Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
|
|
PQerrorMessage() returns an error message with a trailing newline, but
in backend use (dblink, postgres_fdw, libpqwalreceiver), we want to have
the error message without that for emitting via ereport(). To simplify
that, add a function pchomp() that returns a pstrdup'ed string with the
trailing newline characters removed.
|
|
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
|
|
An upcoming patch introduces a new type of memory context. To avoid
duplicating debugging infrastructure within aset.c, move useful pieces
to memdebug.[ch].
While touching aset.c, fix printf format code in AllocFree* debug
macros.
Author: Tomas Vondra
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc747@2ndquadrant.com
|
|
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.
While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files". While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern. (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)
I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
|
|
If someone were to try to call one of the enum comparison functions
using DirectFunctionCallN, it would very likely seem to work, because
only in unusual cases does enum_cmp_internal() need to access the
typcache. But once such a case occurred, code like that would crash
with a null pointer dereference. To make an oversight of that sort
less likely to escape detection, add a non-bypassable Assert that
fcinfo->flinfo isn't NULL.
Discussion: https://postgr.es/m/25226.1487900067@sss.pgh.pa.us
|
|
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.
This required including datatype/timestamp.h in a couple more places
than before. I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose. Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
|
|
This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP
tests and the negative-case controlled code.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
|
|
We don't need it any more.
pg_controldata continues to report that date/time type storage is
"64-bit integers", but that's now a hard-wired behavior not something
it sees in the data. This avoids breaking pg_upgrade, and perhaps other
utilities that inspect pg_control this way. Ditto for pg_resetwal.
I chose to remove the "bigint_timestamps" output column of
pg_control_init(), though, as that function hasn't been around long
and probably doesn't have ossified users.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
|
|
Columns with array pseudotypes have not been identified as arrays, so
they have been rendered as strings in the json and jsonb conversion
routines. This change allows them to be rendered as json arrays, making
it possible to deal correctly with the anyarray columns in pg_stats.
|
|
neha khatri
|
|
|
|
Be specific about which pattern is being complained of, and avoid saying
"it's not supported in to_date", which is just confusing if the error is
actually coming out of to_timestamp. We can phrase it as "is only
supported in to_char", instead. Also, use the term "formatting field" not
"format pattern", because other error messages in the same file prefer that
terminology. (This isn't terribly consistent with the documentation, so
maybe we should change all these error messages?)
|
|
A new function dsa_allocate_extended now takes flags which indicate
that huge allocations should be permitted, that out-of-memory
conditions should not throw an error, and/or that the returned memory
should be zero-filled, just like MemoryContextAllocateExtended.
Commit 9acb85597f1223ac26a5b19a9345849c43d0ff54, which added
dsa_allocate0, was broken because it failed to account for the
possibility that dsa_allocate() might return InvalidDsaPointer.
This fixes that problem along the way.
Thomas Munro, with some comment changes by me.
Discussion: http://postgr.es/m/CA+Tgmobt7CcF_uQP2UQwWmu4K9qCHehMJP9_9m1urwP8hbOeHQ@mail.gmail.com
|
|
This does the same thing as dsa_allocate, except that the memory
is guaranteed to be zero-filled on return.
Dilip Kumar, adjusted by me.
|
|
In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
|
|
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
|
|
xlog-switch becomes wal-switch, and xlog-insert becomes wal-insert.
|
|
This means pg_receivexlog because pg_receivewal, pg_resetxlog
becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
|
|
The S/390 members of the buildfarm are showing failures indicating
that they're having trouble with the rint() calls I added yesterday.
There's no good reason for that, and I wonder if it is a compiler bug
similar to the one we worked around in d9476b838. Try to fix it using
the same method as before, namely to store the result of rint() back
into a "double" variable rather than immediately converting to int64.
(This isn't entirely waving a dead chicken, since on machines with
wider-than-double float registers, the extra store forces a width
conversion. I don't know if S/390 is like that, but it seems worth
trying.)
In passing, merge duplicate ereport() calls in float8_timestamptz().
Per buildfarm.
|
|
When converting a float value to integer microseconds, we should be careful
to round the value to the nearest integer, typically with rint(); simply
assigning to an int64 variable will truncate, causing apparently off-by-one
values in cases that should work. Most places in the datetime code got
this right, but not these two.
float8_timestamptz() is new as of commit e511d878f (9.6). Previous
versions effectively depended on interval_mul() to do roundoff correctly,
which it does, so this fixes an accuracy regression in 9.6.
The problem in make_interval() dates to its introduction in 9.4. Aside
from being careful to round not truncate, let's incorporate the hours and
minutes inputs into the result with exact integer arithmetic, rather than
risk introducing roundoff error where there need not have been any.
float8_timestamptz() problem reported by Erik Nordström, though this is
not his proposed patch. make_interval() problem found by me.
Discussion: https://postgr.es/m/CAHuQZDS76jTYk3LydPbKpNfw9KbACmD=49dC4BrzHcfPv6yA1A@mail.gmail.com
|
|
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images). Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.
Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas. Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
|
|
The problem with the original coding here is that we might receive (and
clear) a relcache invalidation signal for the target relation down inside
one of the index_open calls we're doing. Since the target is open, we
would not drop the relcache entry, just reset its rd_indexvalid and
rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and
would eventually cache and return potentially-obsolete attribute bitmaps.
The case where this matters is where the inval signal was from a CREATE
INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed
column. (In all other cases, the lock we hold on the target rel should
prevent any concurrent change in index state.) Even just returning the
stale attribute bitmap is not such a problem, because it shouldn't matter
during the transaction in which we receive the signal. What hurts is
caching the stale data, because it can survive into later transactions,
breaking CREATE INDEX CONCURRENTLY's expectation that later transactions
will not create new broken HOT chains. The upshot is that there's a window
for building corrupted indexes during CREATE INDEX CONCURRENTLY.
This patch fixes the problem by rechecking that the set of index OIDs
is still the same at the end of RelationGetIndexAttrBitmap() as it was
at the start. If not, we loop back and try again. That's a little
more than is strictly necessary to fix the bug --- in principle, we
could return the stale data but not cache it --- but it seems like a
bad idea on general principles for relcache to return data it knows
is stale.
There might be more hazards of the same ilk, or there might be a better
way to fix this one, but this patch definitely improves matters and seems
unlikely to make anything worse. So let's push it into today's releases
even as we continue to study the problem.
Pavan Deolasee and myself
Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
|
|
Commit 665d1fad9 introduced rd_pkindex, and made RelationGetIndexList
responsible for updating it, but didn't bother to fix
RelationGetIndexList's header comment to say so.
|
|
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.
Josh Soref
Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
|
|
There is no particularly good reason to limit this value to 1000,
so increase the limit to INT_MAX / 2, the same limit we use for
shared_buffers. It's not clear how much practical effect larger
settings will have, but there seems no harm in letting people try it.
Jim Nasby, less a comment change I stripped out.
Discussion: http://postgr.es/m/f6e58a22-030b-eb8a-5457-f62fb08d701c@BlueTreble.com
|
|
These were left out by mistake back when support for KOI8-U encoding was
added.
Extracted from Kyotaro Horiguchi's larger patch.
|
|
Doing so doesn't seem to be within the purpose of the per user
connection limits, and has particularly unfortunate effects in
conjunction with parallel queries.
Backpatch to 9.6 where parallel queries were introduced.
David Rowley, reviewed by Robert Haas and Albe Laurenz.
|