summaryrefslogtreecommitdiff
path: root/src/backend/utils
AgeCommit message (Collapse)Author
2017-10-06Basic partition-wise join functionality.Robert Haas
Instead of joining two partitioned tables in their entirety we can, if it is an equi-join on the partition keys, join the matching partitions individually. This involves teaching the planner about "other join" rels, which are related to regular join rels in the same way that other member rels are related to baserels. This can use significantly more CPU time and memory than regular join planning, because there may now be a set of "other" rels not only for every base relation but also for every join relation. In most practical cases, this probably shouldn't be a problem, because (1) it's probably unusual to join many tables each with many partitions using the partition keys for all joins and (2) if you do that scenario then you probably have a big enough machine to handle the increased memory cost of planning and (3) the resulting plan is highly likely to be better, so what you spend in planning you'll make up on the execution side. All the same, for now, turn this feature off by default. Currently, we can only perform joins between two tables whose partitioning schemes are absolutely identical. It would be nice to cope with other scenarios, such as extra partitions on one side or the other with no match on the other side, but that will have to wait for a future patch. Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit Khandekar, and by me. A few final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-04Document and use SPI_result_code_string()Peter Eisentraut
A lot of semi-internal code just prints out numeric SPI error codes, which is not very helpful. We already have an API function to convert the codes to a string, so let's make more use of that. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-10-04Move SPI error reporting out of ri_ReportViolation()Peter Eisentraut
These are two completely unrelated code paths, so it doesn't make sense to pack them into one function. Add attribute noreturn to ri_ReportViolation(). Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-10-04Replace binary search in fmgr_isbuiltin with a lookup array.Andres Freund
Turns out we have enough functions that the binary search is quite noticeable in profiles. Thus have Gen_fmgrtab.pl build a new mapping from a builtin function's oid to an index in the existing fmgr_builtins array. That keeps the additional memory usage at a reasonable amount. Author: Andres Freund, with input from Tom Lane Discussion: https://postgr.es/m/20170914065128.a5sk7z4xde5uy3ei@alap3.anarazel.de
2017-09-30Support arrays over domains.Tom Lane
Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
2017-09-29Add background worker typePeter Eisentraut
Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-09-29Remove replacement selection sort.Robert Haas
At the time replacement_sort_tuples was introduced, there were still cases where replacement selection sort noticeably outperformed using quicksort even for the first run. However, those cases seem to have evaporated as a result of further improvements made since that time (and perhaps also advances in CPU technology). So remove replacement selection and the controlling GUC entirely. This makes tuplesort.c noticeably simpler and probably paves the way for further optimizations someone might want to do later. Peter Geoghegan, with review and testing by Tomas Vondra and me. Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com
2017-09-27Fix behavior when converting a float infinity to numeric.Tom Lane
float8_numeric() and float4_numeric() failed to consider the possibility that the input is an IEEE infinity. The results depended on the platform-specific behavior of sprintf(): on most platforms you'd get something like ERROR: invalid input syntax for type numeric: "inf" but at least on Windows it's possible for the conversion to succeed and deliver a finite value (typically 1), due to a nonstandard output format from sprintf and lack of syntax error checking in these functions. Since our numeric type lacks the concept of infinity, a suitable conversion is impossible; the best thing to do is throw an explicit error before letting sprintf do its thing. While at it, let's use snprintf not sprintf. Overrunning the buffer should be impossible if sprintf does what it's supposed to, but this is cheap insurance against a stack smash if it doesn't. Problem reported by Taiki Kondo. Patch by me based on fix suggestion from KaiGai Kohei. Back-patch to all supported branches. Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp
2017-09-27Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE.Tom Lane
This reverts commit 15bc038f9, along with the followon commits 1635e80d3 and 984c92074 that tried to clean up the problems exposed by bug #14825. The result was incomplete because it failed to address parallel-query requirements. With 10.0 release so close upon us, now does not seem like the time to be adding more code to fix that. I hope we can un-revert this code and add the missing parallel query support during the v11 cycle. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
2017-09-26Remove heuristic same-transaction test from check_safe_enum_use().Tom Lane
The blacklist mechanism added by the preceding commit directly fixes most of the practical cases that the same-transaction test was meant to cover. What remains is use-cases like begin; create type e as enum('x'); alter type e add value 'y'; -- use 'y' somehow commit; However, because the same-transaction test is heuristic, it fails on small variants of that, such as renaming the type or changing its owner. Rather than try to explain the behavior to users, let's remove it and just have a rule that the newly added value can't be used before being committed, full stop. Perhaps later it will be worth the implementation effort and overhead to have a more accurate test for type-was-created-in-this-transaction. We'll wait for some field experience with v10 before deciding to do that. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
2017-09-26Use a blacklist to distinguish original from add-on enum values.Tom Lane
Commit 15bc038f9 allowed ALTER TYPE ADD VALUE to be executed inside transaction blocks, by disallowing the use of the added value later in the same transaction, except under limited circumstances. However, the test for "limited circumstances" was heuristic and could reject references to enum values that were created during CREATE TYPE AS ENUM, not just later. This breaks the use-case of restoring pg_dump scripts in a single transaction, as reported in bug #14825 from Balazs Szilfai. We can improve this by keeping a "blacklist" table of enum value OIDs created by ALTER TYPE ADD VALUE during the current transaction. Any visible-but-uncommitted value whose OID is not in the blacklist must have been created by CREATE TYPE AS ENUM, and can be used safely because it could not have a lifespan shorter than its parent enum type. This change also removes the restriction that a renamed enum value can't be used before being committed (unless it was on the blacklist). Andrew Dunstan, with cosmetic improvements by me. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
2017-09-25Make construct_[md_]array return a valid empty array for zero-size input.Tom Lane
If construct_array() or construct_md_array() were given a dimension of zero, they'd produce an array that contains no elements but has positive dimension. This violates a general expectation that empty arrays should have ndims = 0; in particular, while arrays like this print as empty, they don't compare equal to other empty arrays. Up to now we've expected callers to avoid making such calls and instead be careful to call construct_empty_array() if there would be no elements. But this has always been an easily missed case, and we've repeatedly had to fix callers to do it right. In bug #14826, Erwin Brandstetter pointed out yet another such oversight, in ts_lexize(); and a bit of examination of other call sites found at least two more with similar issues. So let's fix the problem centrally and permanently by changing these two functions to construct a proper zero-D empty array whenever the array would be empty. This renders a few explicit calls of construct_empty_array() redundant, but the only such place I found that really seemed worth changing was in ExecEvalArrayExpr(). Although this fixes some very old bugs, no back-patch: the problem is pretty minor and the risk of changing behavior seems to outweigh the benefit in stable branches. Discussion: https://postgr.es/m/20170923125723.1448.39412@wrigleys.postgresql.org Discussion: https://postgr.es/m/20570.1506198383@sss.pgh.pa.us
2017-09-24Allow ICU to use SortSupport on Windows with UTF-8Peter Eisentraut
There is no reason to ever prevent the use of SortSupport on Windows when ICU locales are used. We previously avoided SortSupport on Windows with UTF-8 server encoding and a non C-locale due to restrictions in Windows' libc functionality. This is now considered to be a restriction in one platform's libc collation provider, and not a more general platform restriction. Reported-by: Peter Geoghegan <pg@bowt.ie>
2017-09-23Refactor new file permission handlingPeter Eisentraut
The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>
2017-09-22Assume wcstombs(), towlower(), and sibling functions are always present.Tom Lane
These functions are required by SUS v2, which is our minimum baseline for Unix platforms, and are present on all interesting Windows versions as well. Even our oldest buildfarm members have them. Thus, we were not testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug fixed in commit e6023ee7f escaped detection. Per discussion, there seems to be no more real-world value in maintaining this option. Hence, remove the configure-time tests for wcstombs() and towlower(), remove the USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code. There's not actually all that much of the latter, but simplifying the #if nests is a win in itself. Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com
2017-09-22Fix build with !USE_WIDE_UPPER_LOWERPeter Eisentraut
The placement of the ifdef blocks in formatting.c was pretty bogus, so the code failed to compile if USE_WIDE_UPPER_LOWER was not defined. Reported-by: Peter Geoghegan <pg@bowt.ie> Reported-by: Noah Misch <noah@leadboat.com>
2017-09-20Improve dubious memory management in pg_newlocale_from_collation().Tom Lane
pg_newlocale_from_collation() used malloc() and strdup() directly, which is generally not per backend coding style, and it didn't bother to check for failure results, but would just SIGSEGV instead. Also, if one of the numerous error checks in the middle of the function failed, the already-allocated memory would be leaked permanently. Admittedly, it's not a lot of memory, but it could build up if this function were called repeatedly for a bad collation. The first two problems are easily cured by palloc'ing in TopMemoryContext instead of calling libc directly. We can fairly easily dodge the leakage problem for the struct pg_locale_struct by filling in a temporary variable and allocating permanent storage only once we reach the bottom of the function. It's harder to get rid of the potential leakage for ICU's copy of the collcollate string, but at least that's only allocated after most of the error checks; so live with that aspect. Back-patch to v10 where this code came in, with one or another of the ICU patches.
2017-09-19Make WAL segment size configurable at initdb time.Andres Freund
For performance reasons a larger segment size than the default 16MB can be useful. A larger segment size has two main benefits: Firstly, in setups using archiving, it makes it easier to write scripts that can keep up with higher amounts of WAL, secondly, the WAL has to be written and synced to disk less frequently. But at the same time large segment size are disadvantageous for smaller databases. So far the segment size had to be configured at compile time, often making it unrealistic to choose one fitting to a particularly load. Therefore change it to a initdb time setting. This includes a breaking changes to the xlogreader.h API, which now requires the current segment size to be configured. For that and similar reasons a number of binaries had to be taught how to recognize the current segment size. Author: Beena Emerson, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-19Remove no-op GiST support functions in the core GiST opclasses.Tom Lane
The preceding patch allowed us to remove useless GiST support functions. This patch actually does that for all the no-op cases in the core GiST code. This buys us whatever performance gain is to be had, and more importantly exercises the preceding patch. There remain no-op functions in the contrib GiST opclasses, but those will take more work to remove. Discussion: https://postgr.es/m/CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=A@mail.gmail.com
2017-09-19Speedup pgstat_report_activity by moving mb-aware truncation to read side.Andres Freund
Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which commonly is executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Rename PgBackendStatus.st_activity to st_activity_raw so existing extension users of the field break - their code has to be adjusted to use pgstat_clip_activity(). Author: Andres Freund Tested-By: Khuntal Ghosh Reviewed-By: Robert Haas, Tom Lane Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkviecpz@alap3.anarazel.de
2017-09-19Cache datatype-output-function lookup info across calls of concat().Tom Lane
Testing indicates this can save a third to a half of the runtime of the function. Pavel Stehule, reviewed by Alexander Kuzmenkov Discussion: https://postgr.es/m/CAFj8pRAT62pRgjoHbgTfJUc2uLmeQ4saUj+yVJAEZUiMwNCmdg@mail.gmail.com
2017-09-18Make DatumGetFoo/PG_GETARG_FOO/PG_RETURN_FOO macro names more consistent.Tom Lane
By project convention, these names should include "P" when dealing with a pointer type; that is, if the result of a GETARG macro is of type FOO *, it should be called PG_GETARG_FOO_P not just PG_GETARG_FOO. Some newer types such as JSONB and ranges had not followed the convention, and a number of contrib modules hadn't gotten that memo either. Rename the offending macros to improve consistency. In passing, fix a few places that thought PG_DETOAST_DATUM() returns a Datum; it does not, it returns "struct varlena *". Applying DatumGetPointer to that happens not to cause any bad effects today, but it's formally wrong. Also, adjust an ltree macro that was designed without any thought for what pgindent would do with it. This is all cosmetic and shouldn't have any impact on generated code. Mark Dilger, some further tweaks by me Discussion: https://postgr.es/m/EA5676F4-766F-4F38-8348-ECC7DB427C6A@gmail.com
2017-09-17Fix bogus size calculation introduced by commit cc5f81366.Tom Lane
The elements of RecordCacheArray are TupleDesc, not TupleDesc *. Those are actually the same size, so that this error is harmless, but it's still wrong --- and it might bite us someday, if TupleDesc ever became a struct, say. Per Coverity.
2017-09-15Apply pg_get_serial_sequence() to identity column sequences as wellPeter Eisentraut
Bug: #14813
2017-09-15Get rid of shared_record_typmod_registry_worker_detach; it doesn't work.Tom Lane
This code is unsafe, as proven by buildfarm failures, because it tries to access shared memory that might already be gone. It's also unnecessary, because we're about to exit the process anyway and so the record type cache should never be accessed again. The idea was to lay some foundations for someday recycling workers --- which would require attaching to a different shared tupdesc registry --- but that will require considerably more thought. In the meantime let's save some bytes by just removing the nonfunctional code. Problem identification, and proposal to fix by removing functionality from the detach function, by Thomas Munro. I went a bit further by removing the function altogether. Discussion: https://postgr.es/m/E1dsguX-00056N-9x@gemulon.postgresql.org
2017-09-15Don't use anonymous unions.Tom Lane
Commit cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd introduced a language feature that is not acceptable to strict C89 compilers. Thomas Munro Per buildfarm.
2017-09-14Add support for coordinating record typmods among parallel workers.Andres Freund
Tuples can have type RECORDOID and a typmod number that identifies a blessed TupleDesc in a backend-private cache. To support the sharing of such tuples through shared memory and temporary files, provide a typmod registry in shared memory. To achieve that, introduce per-session DSM segments, created on demand when a backend first runs a parallel query. The per-session DSM segment has a table-of-contents just like the per-query DSM segment, and initially the contents are a shared record typmod registry and a DSA area to provide the space it needs to grow. State relating to the current session is accessed via a Session object reached through global variable CurrentSession that may require significant redesign further down the road as we figure out what else needs to be shared or remodelled. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-13Distinguish selectivity of < from <= and > from >=.Tom Lane
Historically, the selectivity functions have simply not distinguished < from <=, or > from >=, arguing that the fraction of the population that satisfies the "=" aspect can be considered to be vanishingly small, if the comparison value isn't any of the most-common-values for the variable. (If it is, the code path that executes the operator against each MCV will take care of things properly.) But that isn't really true unless we're dealing with a continuum of variable values, and in practice we seldom are. If "x = const" would estimate a nonzero number of rows for a given const value, then it follows that we ought to estimate different numbers of rows for "x < const" and "x <= const", even if the const is not one of the MCVs. Handling this more honestly makes a significant difference in edge cases, such as the estimate for a tight range (x BETWEEN y AND z where y and z are close together). Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly split scalargtsel into scalargtsel/scalargesel. Adjust <= and >= operator definitions to reference the new selectivity functions. Improve the core ineq_histogram_selectivity() function to make a correction for equality. (Along the way, I learned quite a bit about exactly why that function gives good answers, which I tried to memorialize in improved comments.) The corresponding join selectivity functions were, and remain, just stubs. But I chose to split them similarly, to avoid confusion and to prevent the need for doing this exercise again if someone ever makes them less stubby. In passing, change ineq_histogram_selectivity's clamp for extreme probability estimates so that it varies depending on the histogram size, instead of being hardwired at 0.0001. With the default histogram size of 100 entries, you still get the old clamp value, but bigger histograms should allow us to put more faith in edge values. Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
2017-09-12Introduce BYTES unit for GUCs.Andres Freund
This is already useful for track_activity_query_size, and will further be used in a later commit making the WAL segment size configurable. Author: Beena Emerson Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAOG9ApEu8bXVwBxkOO9J7ZpM76TASK_vFMEEiCEjwhMmSLiaqQ@mail.gmail.com
2017-09-11Constify numeric.c.Andres Freund
This allows the compiler/linker to move the static variables to a read-only segment. Not all the signature changes are necessary, but it seems better to apply const in a consistent manner. Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20170910232154.asgml44ji2b7lv3d@alap3.anarazel.de
2017-09-11Message style fixesPeter Eisentraut
2017-09-08Allow a partitioned table to have a default partition.Robert Haas
Any tuples that don't route to any other partition will route to the default partition. Jeevan Ladhe, Beena Emerson, Ashutosh Bapat, Rahila Syed, and Robert Haas, with review and testing at various stages by (at least) Rushabh Lathia, Keith Fiske, Amit Langote, Amul Sul, Rajkumar Raghuanshi, Sven Kunze, Kyotaro Horiguchi, Thom Brown, Rafia Sabih, and Dilip Kumar. Discussion: http://postgr.es/m/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com Discussion: http://postgr.es/m/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg@mail.gmail.com
2017-09-08Remove mention of password_encryption = plain in postgresql.conf.sample.Tom Lane
Evidently missed in commit eb61136dc. Spotted by Oleg Bartunov. Discussion: https://postgr.es/m/CAF4Au4wz_iK5r4fnTnnd8XqioAZQs-P7-VsEAfivW34zMVpAmw@mail.gmail.com
2017-09-07Improve performance of get_actual_variable_range with recently-dead tuples.Tom Lane
In commit fccebe421, we hacked get_actual_variable_range() to scan the index with SnapshotDirty, so that if there are many uncommitted tuples at the end of the index range, it wouldn't laboriously scan through all of them looking for a live value to return. However, that didn't fix it for the case of many recently-dead tuples at the end of the index; SnapshotDirty recognizes those as committed dead and so we're back to the same problem. To improve the situation, invent a "SnapshotNonVacuumable" snapshot type and use that instead. The reason this helps is that, if the snapshot rejects a given index entry, we know that the indexscan will mark that index entry as killed. This means the next get_actual_variable_range() scan will proceed past that entry without visiting the heap, making the scan a lot faster. We may end up accepting a recently-dead tuple as being the estimated extremal value, but that doesn't seem much worse than the compromise we made before to accept not-yet-committed extremal values. The cost of the scan is still proportional to the number of dead index entries at the end of the range, so in the interval after a mass delete but before VACUUM's cleaned up the mess, it's still possible for get_actual_variable_range() to take a noticeable amount of time, if you've got enough such dead entries. But the constant factor is much much better than before, since all we need to do with each index entry is test its "killed" bit. We chose to back-patch commit fccebe421 at the time, but I'm hesitant to do so here, because this form of the problem seems to affect many fewer people. Also, even when it happens, it's less bad than the case fixed by commit fccebe421 because we don't get the contention effects from expensive TransactionIdIsInProgress tests. Dmitriy Sarafannikov, reviewed by Andrey Borodin Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru
2017-09-07Reduce excessive dereferencing of function pointersPeter Eisentraut
It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
2017-09-06Allow SET STATISTICS on expression indexesSimon Riggs
Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs
2017-09-06Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs.Tom Lane
The NAMEDTUPLESTORE patch piggybacked on the infrastructure for TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns, so the possibility was ignored most places. Fix that, including adding a specification to parsenodes.h about what it's supposed to look like. In passing, clean up assorted comments that hadn't been maintained properly by said patch. Per bug #14799 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org
2017-09-05Remove unnecessary parentheses in return statementsPeter Eisentraut
The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
2017-09-01Fix assorted carelessness about Datum vs. int64 vs. uint64Robert Haas
Bugs introduced by commit 81c5e46c490e2426db243eada186995da5bb0ba7
2017-08-31Try to repair poorly-considered code in previous commit.Robert Haas
2017-08-31Introduce 64-bit hash functions with a 64-bit seed.Robert Haas
This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
2017-08-29Propagate sort instrumentation from workers back to leader.Robert Haas
Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com
2017-08-24Fix harmless thinko in dsa.c.Andres Freund
Commit 16be2fd100199bdf284becfcee02c5eb20d8a11d added DSA_ALLOC_HUGE, DSA_ALLOC_ZERO and DSA_ALLOC_NO_OOM which have the same numerical values and meanings as the similarly named MCXT_... macros. In one place we accidentally used MCXT_ALLOC_NO_OOM when DSA_ALLOC_NO_OOM is wanted, so tidy that up. Author: Thomas Munro Discussion: http://postgr.es/m/CAEepm=2AimHxVkkxnMfQvbZMkXy0uKbVa0-D38c5-qwrCm4CMQ@mail.gmail.com Backpatch: 10, where dsa was introduced.
2017-08-22Refactor typcache.c's record typmod hash table.Andres Freund
Previously, tuple descriptors were stored in chains keyed by a fixed size array of OIDs. That meant there were effectively two levels of collision chain -- one inside and one outside the hash table. Instead, let dynahash.c look after conflicts for us by supplying a proper hash and equal function pair. This is a nice cleanup on its own, but also simplifies followup changes allowing blessed TupleDescs to be shared between backends participating in parallel query. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com
2017-08-20Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).Andres Freund
This is a mechanical change in preparation for a later commit that will change the layout of TupleDesc. Introducing a macro to abstract the details of where attributes are stored will allow us to change that in separate step and revise it in future. Author: Thomas Munro, editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-16Make the planner assume that the entries in a VALUES list are distinct.Tom Lane
Previously, if we had to estimate the number of distinct values in a VALUES column, we fell back on the default behavior used whenever we lack statistics, which effectively is that there are Min(# of entries, 200) distinct values. This can be very badly off with a large VALUES list, as noted by Jeff Janes. We could consider actually running an ANALYZE-like scan on the VALUES, but that seems unduly expensive, and anyway it could not deliver reliable info if the entries are not all constants. What seems like a better choice is to assume that the values are all distinct. This will sometimes be just as wrong as the old code, but it seems more likely to be more nearly right in many common cases. Also, it is more consistent with what happens in some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly. This was discussed some time ago, but consensus was it'd be better to slip it in at the start of a development cycle not near the end. (It should've gone into v10, really, but I forgot about it.) Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com
2017-08-15Fix up some misusage of appendStringInfo() and friendsPeter Eisentraut
Change to appendStringInfoChar() or appendStringInfoString() where those can be used. Author: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
2017-08-15Avoid out-of-memory in a hash join with many duplicate inner keys.Tom Lane
The executor is capable of splitting buckets during a hash join if too much memory is being used by a small number of buckets. However, this only helps if a bucket's population is actually divisible; if all the hash keys are alike, the tuples still end up in the same new bucket. This can result in an OOM failure if there are enough inner keys with identical hash values. The planner's cost estimates will bias it against choosing a hash join in such situations, but not by so much that it will never do so. To mitigate the OOM hazard, explicitly estimate the hash bucket space needed by just the inner side's most common value, and if that would exceed work_mem then add disable_cost to the hash cost estimate. This approach doesn't account for the possibility that two or more common values would share the same hash value. On the other hand, work_mem is normally a fairly conservative bound, so that eating two or more times that much space is probably not going to kill us. If we have no stats about the inner side, ignore this consideration. There was some discussion of making a conservative assumption, but that would effectively result in disabling hash join whenever we lack stats, which seems like an overreaction given how seldom the problem manifests in the field. Per a complaint from David Hinkle. Although this could be viewed as a bug fix, the lack of similar complaints weighs against back- patching; indeed we waited for v11 because it seemed already rather late in the v10 cycle to be making plan choice changes like this one. Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
2017-08-15Assorted preparatory refactoring for partition-wise join.Robert Haas
Instead of duplicating the logic to search for a matching ParamPathInfo in multiple places, factor it out into a separate function. Pass only the relevant bits of the PartitionKey to partition_bounds_equal instead of the whole thing, because partition-wise join will want to call this without having a PartitionKey available. Adjust allow_star_schema_join and calc_nestloop_required_outer to take relevant Relids rather than the entire Path, because partition-wise join will want to call it with the top-parent relids to determine whether a child join is allowable. Ashutosh Bapat. Review and testing of the larger patch set of which this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih, Thomas Munro, Dilip Kumar, and me. Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-14Final pgindent + perltidy run for v10.Tom Lane