| Age | Commit message (Collapse) | Author |
|
estimate_hash_bucket_stats is defined to return zero to *mcv_freq if
it cannot obtain a value for the frequency of the most common value.
Its sole caller final_cost_hashjoin ignored this provision and would
blindly believe the zero value, resulting in computing zero for the
largest bucket size. In consequence, the safety check that intended
to prevent the largest bucket from exceeding get_hash_memory_limit()
was ineffective, allowing very silly plans to be chosen if statistics
were missing.
After fixing final_cost_hashjoin to disregard zero results for
mcv_freq, a second problem appeared: some cases that should use hash
joins failed to. This is because estimate_hash_bucket_stats was
unaware of the fact that ANALYZE won't store MCV statistics if it
doesn't find any multiply-occurring values. Thus the lack of an MCV
stats entry doesn't necessarily mean that we know nothing; we may
well know that the column is unique. The former coding returned zero
for *mcv_freq in this case, which was pretty close to correct, but now
final_cost_hashjoin doesn't believe it and disables the hash join.
So check to see if there is a HISTOGRAM stats entry; if so, ANALYZE
has in fact run for this column and must have found it to be unique.
In that case report the MCV frequency as 1 / rows, instead of claiming
ignorance.
Reporting a more accurate *mcv_freq in this case can also affect the
bucket-size skew adjustment further down in estimate_hash_bucket_stats,
causing hash-join cost estimates to change slightly. This affects
some plan choices in the core regression tests. The first diff in
join.out corresponds to a case where we have no stats and should not
risk a hash join, but the remaining changes are caused by producing
a better bucket-size estimate for unique join columns. Those are all
harmless changes so far as I can tell.
The existing behavior was introduced in commit 4867d7f62 in v11.
It appears from the commit log that disabling the bucket-size safety
check in the absence of statistics was intentional; but we've now seen
a case where the ensuing behavior is bad enough to make that seem like
a poor decision. In any case the lack of other problems with that
safety check after several years helps to justify enforcing it more
strictly. However, we won't risk back-patching this, in case any
applications are depending on the existing behavior.
Bug: #19363
Reported-by: Jinhui Lai <jinhui.lai@qq.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2380165.1766871097@sss.pgh.pa.us
Discussion: https://postgr.es/m/19363-8dd32fc7600a1153@postgresql.org
|
|
When looking up statistical data about an expression, we failed to
look through PlaceHolderVar nodes, treating them as opaque. This
could prevent us from matching an expression to base columns, index
expressions, or extended statistics, as examine_variable() relies on
strict structural matching.
As a result, queries involving PlaceHolderVar nodes often fell back to
default selectivity estimates, potentially leading to poor plan
choices.
This patch updates examine_variable() to strip PlaceHolderVars before
analysis. This is safe during estimation because PlaceHolderVars are
transparent for the purpose of statistics lookup: they do not alter
the value distribution of the underlying expression.
To minimize performance overhead on this hot path, a lightweight
walker first checks for the presence of PlaceHolderVars. The more
expensive mutator is invoked only when necessary.
There is one ensuing plan change in the regression tests, which is
expected and demonstrates the fix: the rowcount estimate becomes much
more accurate with this patch.
Back-patch to v18. Although this issue exists before that, changes in
this version made it common enough to notice. Given the lack of field
reports for older versions, I am not back-patching further.
Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
|
|
Here, Datum was used to pass around an opaque pointer between a group
of functions. But one might as well use void * for that; the use of
Datum doesn't achieve anything here and is just distracting.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1c5d23cb-288b-4154-b1cd-191fe2301707%40eisentraut.org
|
|
pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure
that the reported query string is correctly truncated when it finishes
with an incomplete multi-byte sequence. However, the result returned by
the function was not what pgstat_clip_activity() generated, but the
non-truncated, original, contents from PgBackendStatus.st_activity_raw.
Oversight in 54b6cd589ac2, so backpatch all the way down.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com
Backpatch-through: 14
|
|
Author: Zizhen Qiao <zizhen_qiao@163.com>
Discussion: https://postgr.es/m/5ee635f9.49f7.19b4ed9e803.Coremail.zizhen_qiao@163.com
|
|
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.
This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.
When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.
While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.
The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.
A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.
Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
|
|
off_t was previously used for offsets, which is 4 bytes on Windows,
hence limiting the backend code to a hard limit for files longer than
2GB. This leads to some simplification in these files, removing some
casts based on long, also 4 bytes on Windows.
This commit removes one comment introduced in db3c4c3a2d98, not relevant
anymore as pgoff_t is a safe 8-byte alternative on Windows.
This change is surprisingly not invasive, as the callers of
BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c,
tuplestore.c, etc.) track offsets in local structures that just to
switch from off_t to pgoff_t for the most part.
The file is still relying on a maximum file size of
MAX_PHYSICAL_FILESIZE (1GB). This change allows the code to make this
maximum potentially larger in the future, or larger on a per-demand
basis.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz
|
|
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/08cbaeb5-aaaf-47b6-9ed8-4f7455b0bc4b@iki.fi
|
|
Previously, if memory context logging was triggered repeatedly and
rapidly while a previous request was still being processed, it could
result in recursive calls to ProcessLogMemoryContextInterrupt().
This could lead to infinite recursion and potentially crash the process.
This commit adds a guard to prevent such recursion.
If ProcessLogMemoryContextInterrupt() is already in progress and
logging memory contexts, subsequent calls will exit immediately,
avoiding unintended recursive calls.
While this scenario is unlikely in practice, it's not impossible.
This change adds a safety check to prevent such failures.
Back-patch to v14, where memory context logging was introduced.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com>
Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com
Backpatch-through: 14
|
|
This change makes pgstat_report_vacuum() more consistent with
pgstat_report_analyze(), that also uses a Relation. This enforces a
policy that callers of this routine should open and lock the relation
whose statistics are updated before calling this routine. We will
unlikely have a lot of callers of this routine in the tree, but it seems
like a good idea to imply this requirement in the long run.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
|
|
strlcpy() ensures that a target string is zero-terminated, so there is
no need to enforce it a second time in this code. This simplification
could have been done in 0eb23285a257.
Author: Feilong Meng <feelingmeng@foxmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/tencent_771178777C5BC17FCB7F7A1771CD1FFD5708@qq.com
|
|
ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.
We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would be a behavior
change and require additional infrastructure though; so for now, just
avoid the global LC_CTYPE dependency.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.
For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().
One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
In the wake of commit b45242fd3, bytea_sortsupport() still called out
to varstr_sortsupport(). Treating bytea as a kind of text/varchar
required varstr_sortsupport() to allow for the possibility of
NUL bytes, but only for C collation. This was confusing. For
better separation of concerns, create an independent sortsupport
implementation in bytea.c.
The heuristics for bytea_abbrev_abort() remain the same as for
varstr_abbrev_abort(). It's possible that the bytea case warrants
different treatment, but that is left for future investigation.
In passing, adjust some strange looking comparisons in
varstr_abbrev_abort().
Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com
|
|
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding. It rewrites other comments.
Back-patch to v18, where commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704
first appeared. Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN
contains no calls to that function.
Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
|
|
https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.
Introduce like_fixed_prefix_ci(), and use that for case-insensitive
pattern prefixes. It uses multibyte and locale-aware logic, along with
the new pg_iswcased() API introduced in 630706ced0.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
Previously, pg_sync_replication_slots() would finish without synchronizing
slots that didn't meet requirements, rather than failing outright. This
could leave some failover slots unsynchronized if required catalog rows or
WAL segments were missing or at risk of removal, while the standby
continued removing needed data.
To address this, the function now waits for the primary slot to advance to
a position where all required data is available on the standby before
completing synchronization. It retries cyclically until all failover slots
that existed on the primary at the start of the call are synchronized.
Slots created after the function begins are not included. If the standby
is promoted during this wait, the function exits gracefully and the
temporary slots will be removed.
Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com
|
|
Cumulative stats kinds gain the capability to write additional per-entry
data when flushing the stats at shutdown, and read this data when
loading back the stats at startup. This can be fit for example in the
case of variable-length data (like normalized query strings), so as it
becomes possible to link the shared memory stats entries to data that is
stored in a different area, like a DSA segment.
Three new optional callbacks are added to PgStat_KindInfo, available to
variable-numbered stats kinds:
* to_serialized_data: writes auxiliary data for an entry.
* from_serialized_data: reads auxiliary data for an entry.
* finish: performs actions after read/write/discard operations. This is
invoked after processing all the entries of a kind, allowing extensions
to close file handles and clean up resources.
Stats kinds have the option to store this data in the existing pgstats
file, but can as well store it in one or more additional files whose
names can be built upon the entry keys. The new serialized callbacks
are called once an entry key is read or written from the main stats
file. A file descriptor to the main pgstats file is available in the
arguments of the callbacks.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
|
|
The current list from the buildfarm includes quite a few typedef
names that it used to miss. The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array. In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it. Install the current list and re-run pgindent
to update affected code.
Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.
Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
|
|
In commit b61aa76e4 I added an assumption in jsonb_object_agg_finalfn
that it'd be okay to apply uniqueifyJsonbObject repeatedly to a
JsonbValue. I should have studied that code more closely first,
because in skip_nulls mode it removed leading nulls by changing the
"pairs" array start pointer. This broke the data structure's
invariants in two ways: pairs no longer references a repalloc-able
chunk, and the distance from pairs to the end of its array is less
than parseState->size. So any subsequent addition of more pairs is
at high risk of clobbering memory and/or causing repalloc to crash.
Unfortunately, adding more pairs is exactly what will happen when the
aggregate is being used as a window function.
Fix by rewriting uniqueifyJsonbObject to not do that. The prior
coding had little to recommend it anyway.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ec5e96fb-ee49-4e5f-8a09-3f72b4780538@gmail.com
|
|
In RI_ConstraintInfo, period_contained_by_oper and
period_intersect_oper can take either anyrange or anymultirange.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/CA%2BrenyWzDth%2BjqLZA2L2Cezs3wE%2BWX-5P8W2EOVx_zfFD%3Daicg%40mail.gmail.com
|
|
Similar to commit 75f49221c22, it is preferable to use
StaticAssertDecl() instead of StaticAssertStmt() when possible.
Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
|
|
Like commit 123661427b, these were discovered while reviewing
Aleksander Alekseev's proposed changes to pgindent.
|
|
True if character has multiple case forms. Will be a useful
multibyte-aware replacement for char_is_cased().
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
|
|
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
|
|
A comment in tuplesort.c was claiming that the code was defining
INITIAL_MEMTUPSIZE so that it *does not* exceed
ALLOCSET_SEPARATE_THRESHOLD, but the code actually ensures that we
purposefully *do* exceed ALLOCSET_SEPARATE_THRESHOLD for the initial
allocation of the tuples array, as per reasons detailed in the
commentary of grow_memtuples().
Also, there's not much need to repeat the mention about
ALLOCSET_SEPARATE_THRESHOLD in each location where INITIAL_MEMTUPSIZE is
used, so remove those comments.
Author: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/tencent_6FA14F85D6B5B5291532D6789E07F4765C08%40qq.com
|
|
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().
This batch of changes includes most of the trivial changes suggested by
the author for src/backend/.
A total of 334 files are updated here. Among these files, 48 of them
have their build change slightly; these are caused by line number
changes as the new allocation formulas are simpler, shaving around 100
lines of code in total.
Similar work has been done in 0c3c5c3b06a3 and 31d3847a37be.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
|
|
Author: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BFpmFf-hWXtrC0Q3Cr_Xo78zuP_M_VC5xgWPOYOkwqOD0T8eg@mail.gmail.com
|
|
There were a number of useless casts in format arguments, either
where the input to the cast was already in the right type, or
seemingly uselessly casting between types instead of just using the
right format placeholder to begin with.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
|
|
Many of these are probably left over from before use of %zu/%zd was
portable.
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org
|
|
The error details updated in this commit can be reached in the
regression tests. They did not follow the project style, and they
should be written them as full sentences.
Some of the errors are switched to use an elog(), for cases that involve
paths that cannot be reached based on the previous state of the parser
processing the input data (array start, object end, etc.). The error
messages for these cases use now a more consistent style across the
board, with the state of the parser reported for debugging.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1353179.1764901790@sss.pgh.pa.us
|
|
The general case for converting to a JSONB numeric value is to run the
source datatype's output function and then numeric_in, but we can do
substantially better than that for integer and numeric source values.
This patch improves the speed of jsonb_agg by 30% for integer input,
and nearly 2X for numeric input.
Sadly, the obvious idea of using float4_numeric and float8_numeric
to speed up those cases doesn't work: they are actually slower than
the generic coerce-via-I/O method, and not by a small amount.
They might round off differently than this code has historically done,
too. Leave that alone pending possible changes in those functions.
We can also do better than the existing code for text/varchar/bpchar
source data; this optimization is similar to one that already exists
in the json_agg() code. That saves 20% or so for such inputs.
Also make a couple of other minor improvements, such as not giving
JSONTYPE_CAST its own special case outside the switch when it could
perfectly well be handled inside, and not using dubious string hacking
to detect infinity and NaN results.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
|
|
The various variants of jsonb_agg() operate as follows,
for each aggregate input value:
1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.
This is very slightly less silly than it sounds, because phase 1
involves calling non-JSONB code such as datatype output functions,
which are likely to leak memory, and we don't want to leak into the
aggcontext. Nonetheless, phases 2 and 3 are accomplishing exactly
nothing that is useful if we can make phase 1 put the JsonbValue
tree where we need it. We could probably do that with a bunch of
MemoryContextSwitchTo's, but what seems more robust is to give
pushJsonbValue the responsibility of building the JsonbValue tree
in a specified non-current memory context. The previous patch
created the infrastructure for that, and this patch simply makes
the aggregate functions use it and then rips out phases 2 and 3.
For me, this makes jsonb_agg() with a text column as input run
about 2X faster than before. It's not yet on par with json_agg(),
but this removes a whole lot of the difference.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
|
|
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.
This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification. In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.
However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.
I tried to improve the comments in this area too.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
|
|
The SQL standard says that corr() and friends should return NULL in
the mathematically-undefined case where all the inputs in one of
the columns have the same value. We were checking that by seeing
if the sums Sxx and Syy were zero, but that approach is very
vulnerable to roundoff error: if a sum is close to zero but not
exactly that, we'd come out with a pretty silly non-NULL result.
Instead, directly track whether the inputs are all equal by
remembering the common value in each column. Once we detect
that a new input is different from before, represent that by
storing NaN for the common value. (An objection to this scheme
is that if the inputs are all NaN, we will consider that they
were not all equal. But under IEEE float arithmetic rules,
one NaN is never equal to another, so this behavior is arguably
correct. Moreover it matches what we did before in such cases.)
Then, leave the sums at their exact value of zero for as long
as we haven't detected different input values.
This solution requires the aggregate transition state to contain
8 float values not 6, which is not problematic, and it seems to add
less than 1% to the aggregates' runtime, which seems acceptable.
While we're here, improve corr()'s final function to cope with
overflow/underflow in the final calculation, and to clamp its
result to [-1, 1] in case of roundoff error.
Although this is arguably a bug fix, it requires a catversion bump
due to the change in aggregates' initial states, so it can't be
back-patched.
Patch written by me, but many of the ideas are due to Dean Rasheed,
who also did a deal of testing.
Bug: #19340
Reported-by: Oleg Ivanov <o15611@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org
|
|
Due to an off-by-one error, the code failed to find matches at the
end of the haystack. Fix by rewriting the loop.
While at it, fix a comment that claimed that the function could find
a zero-length match. Such a match could send a caller into an endless
loop. However, zero-length matches only make sense with an empty
search string, and that case is explicitly excluded by all callers.
To make sure it stays that way, add an Assert and a comment.
Bug: #19341
Reported-by: Adam Warland <adam.warland@infor.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19341-1d9a22915edfec58@postgresql.org
Backpatch-through: 18
|
|
Newest versions of gcc are able to detect cases where code implicitly
casts away const by assigning the result of strchr() or a similar
function applied to a "const char *" value to a target variable
that's just "char *". This of course creates a hazard of not getting
a compiler warning about scribbling on a string one was not supposed
to, so fixing up such cases is good.
This patch fixes a dozen or so places where we were doing that.
Most are trivial additions of "const" to the target variable,
since no actually-hazardous change was occurring. There is one
place in ecpg.trailer where we were indeed violating the intention
of not modifying a string passed in as "const char *". I believe
that's harmless not a live bug, but let's fix it by copying the
string before modifying it.
There is a remaining trouble spot in ecpg/preproc/variable.c,
which requires more complex surgery. I've left that out of this
commit because I want to study that code a bit more first.
We probably will want to back-patch this once compilers that detect
this pattern get into wider circulation, but for now I'm just
going to apply it to master to see what the buildfarm says.
Thanks to Bertrand Drouvot for finding a couple more spots than
I had.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1324889.1764886170@sss.pgh.pa.us
|
|
Commit 76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.
Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
|
|
Some of the buildfarm members with some old gcc versions have been
complaining about an always-true test for a NULL pointer caused by a
combination of SOFT_ERROR_OCCURRED() and a local ErrorSaveContext
variable.
These warnings are taken care of by removing SOFT_ERROR_OCCURRED(),
switching to a direct variable check, like 56b1e88c804b.
Oversights in e1405aa5e3ac and 44eba8f06e55.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1341064.1764895052@sss.pgh.pa.us
|
|
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e9), so they are not needed anymore.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e9), so they are not needed anymore.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
|
|
Standard practice in PostgreSQL is to use "foo(void)" instead of
"foo()", as the latter looks like an "old-style" function
declaration. Similar changes were made in commits cdf4b9aff2,
0e72b9d440, 7069dbcc31, f1283ed6cc, 7b66e2c086, e95126cf04, and
9f7c527af3.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/aTBObQPg%2Bps5I7vl%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
This type never existed. SubscriptingRef was meant instead.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org
|
|
The comment for the Pointer type says 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'. This
fixes that. Change from Pointer to use char * explicitly where
pointer arithmetic is needed. This makes the meaning of the code
clearer locally and removes a dependency on the actual definition of
the Pointer type. (The definition of the Pointer type is not changed
in this commit.)
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
in arguments of memcpy() and memmove() calls
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
|
|
Both dsa_get_total_size() and dsa_get_total_size_from_handle() take
an exclusive lock just to read a variable. This commit reduces the
lock level to LW_SHARED in those functions.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aS8fMzWs9e8iHxk2%40nathan
|
|
Presently, this view reports NULL for the size of DSAs and dshash
tables because 1) the current backend might not be attached to them
and 2) the registry doesn't save the pointers to the dsa_area or
dshash_table in local memory. Also, the view doesn't show
partially-initialized entries to avoid ambiguity, since those
entries would report a NULL size as well.
This commit introduces a function that looks up the size of a DSA
given its handle (transiently attaching to the control segment if
needed) and teaches pg_dsm_registry_allocations to use it to show
the size of successfully-initialized DSA and dshash entries.
Furthermore, the view now reports partially-initialized entries
with a NULL size.
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan
|