summaryrefslogtreecommitdiff
path: root/src/backend
AgeCommit message (Collapse)Author
2021-07-12Remove dead assignment to local variable.Heikki Linnakangas
This should have been removed in commit 7e30c186da, which split the loop into two. Only the first loop uses the 'from' variable; updating it in the second loop is bogus. It was never read after the first loop, so this was harmless and surely optimized away by the compiler, but let's be tidy. Backpatch to all supported versions. Author: Ranier Vilela Discussion: https://www.postgresql.org/message-id/CAEudQAoWq%2BAL3BnELHu7gms2GN07k-np6yLbukGaxJ1vY-zeiQ%40mail.gmail.com
2021-07-11Lock the extension during ALTER EXTENSION ADD/DROP.Tom Lane
Although we were careful to lock the object being added or dropped, we failed to get any sort of lock on the extension itself. This allowed the ALTER to proceed in parallel with a DROP EXTENSION, which is problematic for a couple of reasons. If both commands succeeded we'd be left with a dangling link in pg_depend, which would cause problems later. Also, if the ALTER failed for some reason, it might try to print the extension's name, and that could result in a crash or (in older branches) a silly error message complaining about extension "(null)". Per bug #17098 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17098-b960f3616c861f83@postgresql.org
2021-07-10Fix assign_record_type_typmod().Jeff Davis
If an error occurred in the wrong place, it was possible to leave an unintialized entry in the hash table, leading to a crash. Fixed. Also, be more careful about the order of operations so that an allocation error doesn't leak memory in CacheMemoryContext or unnecessarily advance NextRecordTypmod. Backpatch through version 11. Earlier versions (prior to 35ea75632a5) do not exhibit the problem, because an uninitialized hash entry contains a valid empty list. Author: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Reviewed-by: Andres Freund Discussion: https://postgr.es/m/HE1PR8303MB009069D476225B9A9E194B8891779@HE1PR8303MB0090.EURPRD83.prod.outlook.com Backpatch-through: 11
2021-07-10Add more sanity checks in SASL exchangesMichael Paquier
The following checks are added, to make the SASL infrastructure more aware of defects when implementing new mechanisms: - Detect that no output is generated by a mechanism if an exchange fails in the backend, failing if there is a message waiting to be sent. - Handle zero-length messages in the frontend. The backend handles that already, and SCRAM would complain if sending empty messages as this is not authorized for this mechanism, but other mechanisms may want this capability (the SASL specification allows that). - Make sure that a mechanism generates a message in the middle of the exchange in the frontend. SCRAM, as implemented, respects all these requirements already, and the recent refactoring of SASL done in 9fd8557 helps in documenting that in a cleaner way. Analyzed-by: Jacob Champion Author: Michael Paquier Reviewed-by: Jacob Champion Discussion: https://postgr.es/m/3d2a6f5d50e741117d6baf83eb67ebf1a8a35a11.camel@vmware.com
2021-07-10Fix numeric_mul() overflow due to too many digits after decimal point.Dean Rasheed
This fixes an overflow error when using the numeric * operator if the result has more than 16383 digits after the decimal point by rounding the result. Overflow errors should only occur if the result has too many digits *before* the decimal point. Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
2021-07-09Eliminate replication protocol error related to IDENTIFY_SYSTEM.Jeff Davis
The requirement that IDENTIFY_SYSTEM be run before START_REPLICATION was both undocumented and unnecessary. Remove the error and ensure that ThisTimeLineID is initialized in START_REPLICATION. Elect not to backport because this requirement was expected behavior (even if inconsistently enforced), and is not likely to cause any major problem. Author: Jeff Davis Reviewed-by: Andres Freund Discussion: https://postgr.es/m/de4bbf05b7cd94227841c433ea6ff71d2130c713.camel%40j-davis.com
2021-07-09Avoid creating a RESULT RTE that's marked LATERAL.Tom Lane
Commit 7266d0997 added code to pull up simple constant function results, converting the RTE_FUNCTION RTE to a dummy RTE_RESULT RTE since it no longer need be scanned. But I forgot to clear the LATERAL flag if the RTE has it set. If the function reduced to a constant, it surely contains no lateral references so this simplification is logically OK. It's needed because various other places will Assert that RESULT RTEs aren't LATERAL. Per bug #17097 from Yaoguang Chen. Back-patch to v13 where the faulty code came in. Discussion: https://postgr.es/m/17097-3372ef9f798fc94f@postgresql.org
2021-07-09Reject cases where a query in WITH rewrites to just NOTIFY.Tom Lane
Since the executor can't cope with a utility statement appearing as a node of a plan tree, we can't support cases where a rewrite rule inserts a NOTIFY into an INSERT/UPDATE/DELETE command appearing in a WITH clause of a larger query. (One can imagine ways around that, but it'd be a new feature not a bug fix, and so far there's been no demand for it.) RewriteQuery checked for this, but it missed the case where the DML command rewrites to *only* a NOTIFY. That'd lead to crashes later on in planning. Add the missed check, and improve the level of testing of this area. Per bug #17094 from Yaoguang Chen. It's been busted since WITH was introduced, so back-patch to all supported branches. Discussion: https://postgr.es/m/17094-bf15dff55eaf2e28@postgresql.org
2021-07-09Teach pg_size_pretty and pg_size_bytes about petabytesDavid Rowley
There was talk about adding units all the way up to yottabytes but it seems quite far-fetched that anyone would need those. Since such large units are not exactly commonplace, it seems unlikely that having pg_size_pretty outputting unit any larger than petabytes would actually be helpful to anyone. Since petabytes are on the horizon, let's just add those only. Maybe one day we'll get to add additional units, but it will likely be a while before we'll need to think beyond petabytes in regards to the size of a database. Author: David Christensen Discussion: https://postgr.es/m/CAOxo6XKmHc_WZip-x5QwaOqFEiCq_SVD0B7sbTZQk+qqcn2qaw@mail.gmail.com
2021-07-09Add forgotten LSN_FORMAT_ARGS() in xlogreader.cMichael Paquier
These should have been part of 4035cd5, that introduced LZ4 support for wal_compression.
2021-07-09Remove more obsolete comments about semaphores.Thomas Munro
Commit 6753333f stopped using semaphores as the sleep/wake mechanism for heavyweight locks, but some obsolete references to that scheme remained in comments. As with similar commit 25b93a29, back-patch all the way. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGLafjB1uzXcy%3D%3D2L3cy7rjHkqOVn7qRYGBjk%3D%3DtMJE7Yg%40mail.gmail.com
2021-07-09Use a lookup table for units in pg_size_pretty and pg_size_bytesDavid Rowley
We've grown 2 versions of pg_size_pretty over the years, one for BIGINT and one for NUMERIC. Both should output the same, but keeping them in sync is harder than needed due to neither function sharing a source of truth about which units to use and how to transition to the next largest unit. Here we add a static array which defines the units that we recognize and have both pg_size_pretty and pg_size_pretty_numeric use it. This will make adding any units in the future a very simple task. The table contains all information required to allow us to also modify pg_size_bytes to use the lookup table, so adjust that too. There are no behavioral changes here. Author: David Rowley Reviewed-by: Dean Rasheed, Tom Lane, David Christensen Discussion: https://postgr.es/m/CAApHDvru1F7qsEVL-iOHeezJ+5WVxXnyD_Jo9nht+Eh85ekK-Q@mail.gmail.com
2021-07-09Fix incorrect return value in pg_size_pretty(bigint)David Rowley
Due to how pg_size_pretty(bigint) was implemented, it's possible that when given a negative number of bytes that the returning value would not match the equivalent positive return value when given the equivalent positive number of bytes. This was due to two separate issues. 1. The function used bit shifting to convert the number of bytes into larger units. The rounding performed by bit shifting is not the same as dividing. For example -3 >> 1 = -2, but -3 / 2 = -1. These two operations are only equivalent with positive numbers. 2. The half_rounded() macro rounded towards positive infinity. This meant that negative numbers rounded towards zero and positive numbers rounded away from zero. Here we fix #1 by dividing the values instead of bit shifting. We fix #2 by adjusting the half_rounded macro always to round away from zero. Additionally, adjust the pg_size_pretty(numeric) function to be more explicit that it's using division rather than bit shifting. A casual observer might have believed bit shifting was used due to a static function being named numeric_shift_right. However, that function was calculating the divisor from the number of bits and performed division. Here we make that more clear. This change is just cosmetic and does not affect the return value of the numeric version of the function. Here we also add a set of regression tests both versions of pg_size_pretty() which test the values directly before and after the function switches to the next unit. This bug was introduced in 8a1fab36a. Prior to that negative values were always displayed in bytes. Author: Dean Rasheed, David Rowley Discussion: https://postgr.es/m/CAEZATCXnNW4HsmZnxhfezR5FuiGgp+mkY4AzcL5eRGO4fuadWg@mail.gmail.com Backpatch-through: 9.6, where the bug was introduced.
2021-07-08Fix typos in pgstat.c, reorderbuffer.c and pathnodes.hDaniel Gustafsson
Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/50250765-5B87-4AD7-9770-7FCED42A6175@yesql.se
2021-07-08Improve error messages about mismatching relkindPeter Eisentraut
Most error messages about a relkind that was not supported or appropriate for the command was of the pattern "relation \"%s\" is not a table, foreign table, or materialized view" This style can become verbose and tedious to maintain. Moreover, it's not very helpful: If I'm trying to create a comment on a TOAST table, which is not supported, then the information that I could have created a comment on a materialized view is pointless. Instead, write the primary error message shorter and saying more directly that what was attempted is not possible. Then, in the detail message, explain that the operation is not supported for the relkind the object was. To simplify that, add a new function errdetail_relkind_not_supported() that does this. In passing, make use of RELKIND_HAS_STORAGE() where appropriate, instead of listing out the relkinds individually. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
2021-07-07Use a hash table to speed up NOT IN(values)David Rowley
Similar to 50e17ad28, which allowed hash tables to be used for IN clauses with a set of constants, here we add the same feature for NOT IN clauses. NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3. Obviously, if we're using a hash table we must be exactly equivalent to that and return the same result taking into account that either side of the condition could contain a NULL. This requires a little bit of special handling to make work with the hash table version. When processing NOT IN, the ScalarArrayOpExpr's operator will be the <> operator. To be able to build and lookup a hash table we must use the <>'s negator operator. The planner checks if that exists and is hashable and sets the relevant fields in ScalarArrayOpExpr to instruct the executor to use hashing. Author: David Rowley, James Coleman Reviewed-by: James Coleman, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvoF1mum_FRk6D621edcB6KSHBi2+GAgWmioj5AhOu2vwQ@mail.gmail.com
2021-07-07Refactor SASL code with a generic interface for its mechanismsMichael Paquier
The code of SCRAM and SASL have been tightly linked together since SCRAM exists in the core code, making hard to apprehend the addition of new SASL mechanisms, but these are by design different facilities, with SCRAM being an option for SASL. This refactors the code related to both so as the backend and the frontend use a set of callbacks for SASL mechanisms, documenting while on it what is expected by anybody adding a new SASL mechanism. The separation between both layers is neat, using two sets of callbacks for the frontend and the backend to mark the frontier between both facilities. The shape of the callbacks is now directly inspired from the routines used by SCRAM, so the code change is straight-forward, and the SASL code is moved into its own set of files. These will likely change depending on how and if new SASL mechanisms get added in the future. Author: Jacob Champion Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/3d2a6f5d50e741117d6baf83eb67ebf1a8a35a11.camel@vmware.com
2021-07-06Allow CustomScan providers to say whether they support projections.Tom Lane
Previously, all CustomScan providers had to support projections, but there may be cases where this is inconvenient. Add a flag bit to say if it's supported. Important item for the release notes: this is non-backwards-compatible since the default is now to assume that CustomScan providers can't project, instead of assuming that they can. It's fail-soft, but could result in visible performance penalties due to adding unnecessary Result nodes. Sven Klemm, reviewed by Aleksander Alekseev; some cosmetic fiddling by me. Discussion: https://postgr.es/m/CAMCrgp1kyakOz6c8aKhNDJXjhQ1dEjEnp+6KNT3KxPrjNtsrDg@mail.gmail.com
2021-07-06Reduce the cost of planning deeply-nested views.Tom Lane
Joel Jacobson reported that deep nesting of trivial (flattenable) views results in O(N^3) growth of planning time for N-deep nesting. It turns out that a large chunk of this cost comes from copying around the "subquery" sub-tree of each view's RTE_SUBQUERY RTE. But once we have successfully flattened the subquery, we don't need that anymore, because the planner isn't going to do anything else interesting with that RTE. We already zap the subquery pointer during setrefs.c (cf. add_rte_to_flat_rtable), but it's useless baggage earlier than that too. Clearing the pointer as soon as pull_up_simple_subquery is done with the RTE reduces the cost from O(N^3) to O(N^2); which is still not great, but it's quite a lot better. Further improvements will require rethinking of the RTE data structure, which is being considered in another thread. Patch by me; thanks to Dean Rasheed for review. Discussion: https://postgr.es/m/797aff54-b49b-4914-9ff9-aa42564a4d7d@www.fastmail.com
2021-07-06Refactor function parse_subscription_options.Amit Kapila
Instead of using multiple parameters in parse_subscription_options function signature, use the struct SubOpts that encapsulate all the subscription options and their values. It will be useful for future work where we need to add other options in the subscription. Also, use bitmaps to pass the supported and retrieve the specified options much like the way it is done in the commit a3dc926009. Author: Bharath Rupireddy Reviewed-By: Peter Smith, Amit Kapila, Alvaro Herrera Discussion: https://postgr.es/m/CALj2ACXtoQczfNsDQWobypVvHbX2DtgEHn8DawS0eGFwuo72kw@mail.gmail.com
2021-07-06Fix typo in commentDavid Rowley
Author: James Coleman Discussion: https://postgr.es/m/CAAaqYe8f8ENA0i1PdBtUNWDd2sxHSMgscNYbjhaXMuAdfBrZcg@mail.gmail.com
2021-07-06Reduce the number of pallocs when building partition boundsDavid Rowley
In each of the create_*_bound() functions for LIST, RANGE and HASH partitioning, there were a large number of palloc calls which could be reduced down to a much smaller number. In each of these functions, an array was built so that we could qsort it before making the PartitionBoundInfo. For LIST and HASH partitioning, an array of pointers was allocated then each element was allocated within that array. Since the number of items of each dimension is known beforehand, we can just allocate a single chunk of memory for this. Similarly, with all partition strategies, we're able to reduce the number of allocations to build the ->datums field. This is an array of Datum pointers, but there's no need for the Datums that each element points to to be singly allocated. One big chunk will do. For RANGE partitioning, the PartitionBoundInfo->kind field can get the same treatment. We can apply the same optimizations to partition_bounds_copy(). Doing this might have a small effect on cache performance when searching for the correct partition during partition pruning or DML on a partitioned table. However, that's likely to be small and this is mostly about reducing palloc overhead. Author: Nitin Jadhav, Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby, Zhihong Yu Discussion: https://postgr.es/m/flat/CAMm1aWYFTqEio3bURzZh47jveiHRwgQTiSDvBORczNEz2duZ1Q@mail.gmail.com
2021-07-06Use WaitLatch() instead of pg_usleep() at the end of backupsMichael Paquier
This concerns pg_stop_backup() and BASE_BACKUP, when waiting for the WAL segments required for a backup to be archived. This simplifies a bit the handling of the wait event used in this code path. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Stephen Frost Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
2021-07-05Reduce overhead of cache-clobber testing in LookupOpclassInfo().Tom Lane
Commit 03ffc4d6d added logic to bypass all caching behavior in LookupOpclassInfo when CLOBBER_CACHE_ALWAYS is enabled. It doesn't look like I stopped to think much about what that would cost, but recent investigation shows that the cost is enormous: it roughly doubles the time needed for cache-clobber test runs. There does seem to be value in this behavior when trying to test the opclass-cache loading logic itself, but for other purposes the cost is excessive. Hence, let's back off to doing this only when debug_invalidate_system_caches_always is at least 3; or in older branches, when CLOBBER_CACHE_RECURSIVELY is defined. While here, clean up some other minor issues in LookupOpclassInfo. Re-order the code so we aren't left with broken cache entries (leading to later core dumps) in the unlikely case that we suffer OOM while trying to allocate space for a new entry. (That seems to be my oversight in 03ffc4d6d.) Also, in >= v13, stop allocating one array entry too many. That's evidently left over from sloppy reversion in 851b14b0c. Back-patch to all supported branches, mainly to reduce the runtime of cache-clobbering buildfarm animals. Discussion: https://postgr.es/m/1370856.1625428625@sss.pgh.pa.us
2021-07-05Prevent numeric overflows in parallel numeric aggregates.Dean Rasheed
Formerly various numeric aggregate functions supported parallel aggregation by having each worker convert partial aggregate values to Numeric and use numeric_send() as part of serializing their state. That's problematic, since the range of Numeric is smaller than that of NumericVar, so it's possible for it to overflow (on either side of the decimal point) in cases that would succeed in non-parallel mode. Fix by serializing NumericVars instead, to avoid the overflow risk and ensure that parallel and non-parallel modes work the same. A side benefit is that this improves the efficiency of the serialization/deserialization code, which can make a noticeable difference to performance with large numbers of parallel workers. No back-patch due to risk from changing the binary format of the aggregate serialization states, as well as lack of prior field complaints and low probability of such overflows in practice. Patch by me. Thanks to David Rowley for review and performance testing, and Ranier Vilela for an additional suggestion. Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
2021-07-04Cleanup some aggregate code in the executorDavid Rowley
Here we alter the code that calls build_pertrans_for_aggref() so that the function no longer needs to special-case whether it's dealing with an aggtransfn or an aggcombinefn. This allows us to reuse the build_aggregate_transfn_expr() function and just get rid of the build_aggregate_combinefn_expr() completely. All of the special case code that was in build_pertrans_for_aggref() has been moved up to the calling functions. This saves about a dozen lines of code in nodeAgg.c and a few dozen more in parse_agg.c Also, rename a few variables in nodeAgg.c to try to make it more clear that we're working with either a aggtransfn or an aggcombinefn. Some of the old names would have you believe that we were always working with an aggtransfn. Discussion: https://postgr.es/m/CAApHDvptMQ9FmF0D67zC_w88yVnoNVR2+kkOQGUrCmdxWxLULQ@mail.gmail.com
2021-07-02Don't try to print data type names in slot_store_error_callback().Tom Lane
The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
2021-07-02Use InvalidBucket instead of -1 where appropriatePeter Eisentraut
Reported-by: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEudQAp%3DZwKjrP4L%2BCzqV7SmWiaQidPPRqj4tqdjDG4KBx5yrg%40mail.gmail.com
2021-07-02Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncationMichael Paquier
This has the advantage to make a process more responsive when the postmaster dies, even if the wait time was rather limited as there was only a 50ms timeout here. Another advantage of this change is for monitoring, as we gain a new wait event for the end-of-vacuum truncation. Author: Bharath Rupireddy Reviewed-by: Aleksander Alekseev, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
2021-07-02Remove some dead stores.Thomas Munro
Remove redundant local variable assignments left behind by commit 2fc7af5e966. Author: Quan Zongliang <quanzongliang@yeah.net> Reviewed-by: Jacob Champion <pchampion@vmware.com> Discussion: https://postgr.es/m/de141d14-4fd6-3148-99bf-856b71aa948a%40yeah.net
2021-07-01Don't reset relhasindex for partitioned tables on ANALYZEAlvaro Herrera
Commit 0e69f705cc1a introduced code to analyze partitioned table; however, that code fails to preserve pg_class.relhasindex correctly. Fix by observing whether any indexes exist rather than accidentally falling through to assuming none do. Backpatch to 14. Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CALNJ-vS1R3Qoe5t4tbzxrkpBtzRbPq1dDcW4RmA_a+oqweF30w@mail.gmail.com
2021-07-01Improve various places that double the size of a bufferDavid Rowley
Several places were performing a tight loop to determine the first power of 2 number that's > or >= the required memory. Instead of using a loop for that, we can use pg_nextpower2_32 or pg_nextpower2_64. When we need a power of 2 number equal to or greater than a given amount, we just pass the amount to the nextpower2 function. When we need a power of 2 greater than the amount, we just pass the amount + 1. Additionally, in tsearch there were a couple of locations that were performing a while loop when a simple "if" would have done. In both of these locations only 1 item is being added, so the loop could only have ever iterated once. Changing the loop into an if statement makes the code very slightly more optimal as the condition is checked once rather than twice. There are quite a few remaining locations that increase the size of the buffer in the following form: while (reqsize >= buflen) { buflen *= 2; buf = repalloc(buf, buflen); } These are not touched in this commit. repalloc will error out for sizes larger than MaxAllocSize. Changing these to use pg_nextpower2_32 would remove the chance of that error being raised. It's unclear from the code if the sizes could ever become that large, so err on the side of caution. Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com Reviewed-by: Zhihong Yu
2021-06-30genbki stricter error handlingPeter Eisentraut
Instead of just writing warnings for invalid cross-catalog lookups, count the errors and error out at the end. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ca8ee41d-241b-1bf3-71f0-aaf1add6d3c5%40enterprisedb.com
2021-06-30Replace magic constants used in pg_stat_get_replication_slot().Amit Kapila
A few variables have been using 10 as a magic constant while PG_STAT_GET_REPLICATION_SLOT_COLS can be used instead. Author: Masahiko Sawada Reviewed-By: Amit Kapila Backpatch-through: 14, where it was introduced Discussion: https://postgr.es/m/CAD21AoBvqODDfmD17DkEuPCvV2KbruukXQ2Vwrv5Xi-TsAsTJA@mail.gmail.com
2021-06-30Allow streaming the changes after speculative aborts.Amit Kapila
Until now, we didn't allow to stream the changes in logical replication till we receive speculative confirm or the next DML change record after speculative inserts. The reason was that we never use to process speculative aborts but after commit 4daa140a2f it is possible to process them so we can allow streaming once we receive speculative abort after speculative insertion. We decided to backpatch to 14 where the feature for streaming in progress transactions have been introduced as this is a minor change and makes that functionality better. Author: Amit Kapila Reviewed-By: Dilip Kumar Backpatch-through: 14 Discussion: https://postgr.es/m/CAA4eK1KdqmTCtrBR6oFfGELrLLbDLDedL6zACcsUOQuTJBj1vw@mail.gmail.com
2021-06-30Allow enabling two-phase option via replication protocol.Amit Kapila
Extend the replication command CREATE_REPLICATION_SLOT to support the TWO_PHASE option. This will allow decoding commands like PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED for slots created with this option. The decoding of the transaction happens at prepare command. This patch also adds support of two-phase in pg_recvlogical via a new option --two-phase. This option will also be used by future patches that allow streaming of transactions at prepare time for built-in logical replication. With this, the out-of-core logical replication solutions can enable replication of two-phase transactions via replication protocol. Author: Ajin Cherian Reviewed-By: Jeff Davis, Vignesh C, Amit Kapila Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru https://postgr.es/m/64b9f783c6e125f18f88fbc0c0234e34e71d8639.camel@j-davis.com
2021-06-30Fix incorrect PITR message for transaction ROLLBACK PREPAREDMichael Paquier
Reaching PITR on such a transaction would cause the generation of a LOG message mentioning a transaction committed, not aborted. Oversight in 4f1b890. Author: Simon Riggs Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com Backpatch-through: 9.6
2021-06-29Fixes for multirange selectivity estimationAlexander Korotkov
* Fix enumeration of the multirange operators in calc_multirangesel() and calc_multirangesel() switches. * Add more regression tests for matching to empty ranges/multiranges. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/c5269c65-f967-77c5-ff7c-15e621c47f6a%40gmail.com Author: Alexander Korotkov Backpatch-through: 14, where multiranges were introduced
2021-06-29Fix bogus logic for reporting which hash partition conflicts.Tom Lane
Commit efbfb6424 added logic for reporting exactly which existing partition conflicts when complaining that a new hash partition's modulus isn't compatible with the existing ones. However, it misunderstood the partitioning data structure, and would select the wrong partition in some cases, or crash outright due to fetching a bogus table OID in other cases. Per bug #17076 from Alexander Lakhin. Fix by Amit Langote; some further work on the code comments by me. Discussion: https://postgr.es/m/17076-89a16ae835d329b9@postgresql.org
2021-06-29Add index OID macro argument to DECLARE_INDEXPeter Eisentraut
Instead of defining symbols such as AmOidIndexId explicitly, include them as an argument of DECLARE_INDEX() and have genbki.pl generate the way as the table OID symbols from the CATALOG() declaration. Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/ccef1e46-a404-25b1-9b4c-85f2c08e1f28%40enterprisedb.com
2021-06-29Fix compilation warning in xloginsert.cMichael Paquier
This is reproducible with gcc using at least -O0. The last checks validating the compression of a block could not be reached with this variable not set, but let's be clean. Oversight in 4035cd5, per buildfarm member lapwing.
2021-06-29Add support for LZ4 with compression of full-page writes in WALMichael Paquier
The logic is implemented so as there can be a choice in the compression used when building a WAL record, and an extra per-record bit is used to track down if a block is compressed with PGLZ, LZ4 or nothing. wal_compression, the existing parameter, is changed to an enum with support for the following backward-compatible values: - "off", the default, to not use compression. - "pglz" or "on", to compress FPWs with PGLZ. - "lz4", the new mode, to compress FPWs with LZ4. Benchmarking has showed that LZ4 outclasses easily PGLZ. ZSTD would be also an interesting choice, but going just with LZ4 for now makes the patch minimalistic as toast compression is already able to use LZ4, so there is no need to worry about any build-related needs for this implementation. Author: Andrey Borodin, Justin Pryzby Reviewed-by: Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/3037310D-ECB7-4BF1-AF20-01C10BB33A33@yandex-team.ru
2021-06-28Skip WAL recycling and preallocation during archive recovery.Noah Misch
The previous commit addressed the chief consequences of a race condition between InstallXLogFileSegment() and KeepFileRestoredFromArchive(). Fix three lesser consequences. A spurious durable_rename_excl() LOG message remained possible. KeepFileRestoredFromArchive() wasted the proceeds of WAL recycling and preallocation. Finally, XLogFileInitInternal() could return a descriptor for a file that KeepFileRestoredFromArchive() had already unlinked. That felt like a recipe for future bugs. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28Don't ERROR on PreallocXlogFiles() race condition.Noah Misch
Before a restartpoint finishes PreallocXlogFiles(), a startup process KeepFileRestoredFromArchive() call can unlink the preallocated segment. If a CHECKPOINT sql command had elicited the restartpoint experiencing the race condition, that sql command failed. Moreover, the restartpoint omitted its log_checkpoints message and some inessential resource reclamation. Prevent the ERROR by skipping open() of the segment. Since these consequences are so minor, no back-patch. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28Remove XLogFileInit() ability to unlink a pre-existing file.Noah Misch
Only initdb used it. initdb refuses to operate on a non-empty directory and generally does not cope with pre-existing files of other kinds. Hence, use the opportunity to simplify. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28In XLogFileInit(), fix *use_existent postcondition to suit callers.Noah Misch
Infrequently, the mismatch caused log_checkpoints messages and TRACE_POSTGRESQL_CHECKPOINT_DONE() to witness an "added" count too high by one. Since that consequence is so minor, no back-patch. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28Remove XLogFileInit() ability to skip ControlFileLock.Noah Misch
Cold paths, initdb and end-of-recovery, used it. Don't optimize them. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28Pre branch pgindent / pgperltidy runAndrew Dunstan
Along the way make a slight adjustment to src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28Message style improvementsPeter Eisentraut
2021-06-28Improve RelationGetIdentityKeyBitmap().Amit Kapila
We were using RelationGetIndexList() to update the relation's replica identity index but instead, we can directly use RelationGetReplicaIndex() which uses the same functionality. This is a minor code readability improvement. Author: Japin Li Reviewed-By: Takamichi Osumi, Amit Kapila Discussion: https://postgr.es/m/4C99A862-69C8-431F-960A-81B1151F1B89@enterprisedb.com