summaryrefslogtreecommitdiff
path: root/src/backend/replication
AgeCommit message (Collapse)Author
2016-08-17Properly re-initialize replication slot shared memory upon creation.Andres Freund
Slot creation did not clear all fields upon creation. After start the memory is zeroed, but when a physical replication slot was created in the shared memory of a previously existing logical slot, catalog_xmin would not be cleared. That in turn would prevent vacuum from doing its duties. To fix initialize all the fields. To make similar future bugs less likely, zero all of ReplicationSlotPersistentData, and re-order the rest of the initialization to be in struct member order. Analysis: Andrew Gierth Reported-By: md@chewy.com Author: Michael Paquier Discussion: <20160705173502.1398.70934@wrigleys.postgresql.org> Backpatch: 9.4, where replication slots were introduced
2016-08-12Code cleanup in SyncRepWaitForLSN()Simon Riggs
Commit 14e8803f1 removed LWLocks when accessing MyProc->syncRepState but didn't clean up the surrounding code and comments. Cleanup and backpatch to 9.5, to keep code similar. Julien Rouhaud, improved by suggestion from Michael Paquier, implemented trivially by myself.
2016-08-07Don't propagate a null subtransaction snapshot up to parent transaction.Tom Lane
This oversight could cause logical decoding to fail to decode an outer transaction containing changes, if a subtransaction had an XID but no actual changes. Per bug #14279 from Marko Tiikkaja. Patch by Marko based on analysis by Andrew Gierth. Discussion: <20160804191757.1430.39011@wrigleys.postgresql.org>
2016-06-30Fix typo in ReorderBufferIterTXNInit().Tom Lane
This looks like it would cause changes from subtransactions to be missed by the iterator being constructed, if those changes had been spilled to disk previously. This implies that large subtransactions might be lost (in whole or in part) by logical replication. Found and fixed by Petru-Florin Mihancea, per bug #14208. Report: <20160622144830.5791.22512@wrigleys.postgresql.org>
2016-04-28Remember asking for feedback during walsender shutdown.Andres Freund
Since 5a991ef8 we're explicitly asking for feedback from the receiving side when shutting down walsender, if there's not yet replicated data. Unfortunately we didn't remember (i.e. set waiting_for_ping_response to true) having asked for feedback, leading to scenarios in which replies were requested at a high frequency. I can't reproduce this problem on my laptop, I think that's because the problem requires a significant TCP window to manifest due to the !pq_is_send_pending() condition. But since this clearly is a bug, let's fix it. There's quite possibly more wrong than just this though. While fiddling with WalSndDone(), I rewrote a hard to understand comment about looking at the flush vs. the write position. Reported-By: Nick Cleaton, Magnus Hagander Author: Nick Cleaton Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com Backpatch: 9.4, were 5a991ef8 introduced the use of feedback messages during shutdown.
2016-04-14Fix core dump in ReorderBufferRestoreChange on alignment-picky platforms.Tom Lane
When re-reading an update involving both an old tuple and a new tuple from disk, reorderbuffer.c was careless about whether the new tuple is suitably aligned for direct access --- in general, it isn't. We'd missed seeing this in the buildfarm because the contrib/test_decoding tests exercise this code path only a few times, and by chance all of those cases have old tuples with length a multiple of 4, which is usually enough to make the access to the new tuple's t_len safe. For some still-not-entirely-clear reason, however, Debian's sparc build gets a bus error, as reported by Christoph Berg; perhaps it's assuming 8-byte alignment of the pointer? The lack of previous field reports is probably because you need all of these conditions to trigger a crash: an alignment-picky platform (not Intel), a transaction large enough to spill to disk, an update within that xact that changes a primary-key field and has an odd-length old tuple, and of course logical decoding tracing the transaction. Avoid the alignment assumption by using memcpy instead of fetching t_len directly, and add a test case that exposes the crash on picky platforms. Back-patch to 9.4 where the bug was introduced. Discussion: <20160413094117.GC21485@msg.credativ.de>
2016-04-14Adjust datatype of ReplicationState.acquired_by.Tom Lane
It was declared as "pid_t", which would be fine except that none of the places that printed it in error messages took any thought for the possibility that it's not equivalent to "int". This leads to warnings on some buildfarm members, and could possibly lead to actually wrong error messages on those platforms. There doesn't seem to be any very good reason not to just make it "int"; it's only ever assigned from MyProcPid, which is int. If we want to cope with PIDs that are wider than int, this is not the place to start. Also, fix the comment, which seems to perhaps be a leftover from a time when the field was only a bool? Per buildfarm. Back-patch to 9.5 which has same issue.
2016-03-30Fix broken variable declarationAlvaro Herrera
Author: Konstantin Knizhnik
2016-03-09Avoid unlikely data-loss scenarios due to rename() without fsync.Andres Freund
Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
2016-03-09Introduce durable_rename() and durable_link_or_rename().Andres Freund
Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
2016-03-07Further improvements to c8f621c43.Andres Freund
Coverity and inspection for the issue addressed in fd45d16f found some questionable code. Specifically coverity noticed that the wrong length was added in ReorderBufferSerializeChange() - without immediate negative consequences as the variable isn't used afterwards. During code-review and testing I noticed that a bit of space was wasted when allocating tuple bufs in several places. Thirdly, the debug memset()s in ReorderBufferGetTupleBuf() reduce the error checking valgrind can do. Backpatch: 9.4, like c8f621c43.
2016-03-06Fix wrong allocation size in c8f621c43.Andres Freund
In c8f621c43 I forgot to account for MAXALIGN when allocating a new tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not cause active problems on a number of platforms because the affected pointer is already aligned, but others, like ppc and hppa, trigger this in the regression test, due to a debug memset clearing memory. Fix that. Backpatch: 9.4, like the previous commit.
2016-03-05logical decoding: Fix handling of large old tuples with replica identity full.Andres Freund
When decoding the old version of an UPDATE or DELETE change, and if that tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or failed in more subtle ways in non-assert builds. Normally individual tuples aren't bigger than MaxHeapTupleSize, with big datums toasted. But that's not the case for the old version of a tuple for logical decoding; the replica identity is logged as one piece. With the default replica identity btree limits that to small tuples, but that's not the case for FULL. Change the tuple buffer infrastructure to separate allocate over-large tuples, instead of always going through the slab cache. This unfortunately requires changing the ReorderBufferTupleBuf definition, we need to store the allocated size someplace. To avoid requiring output plugins to recompile, don't store HeapTupleHeaderData directly after HeapTupleData, but point to it via t_data; that leaves rooms for the allocated size. As there's no reason for an output plugin to look at ReorderBufferTupleBuf->t_data.header, remove the field. It was just a minor convenience having it directly accessible. Reported-By: Adam DratwiƄski Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
2016-03-05logical decoding: old/newtuple in spooled UPDATE changes was switched around.Andres Freund
Somehow I managed to flip the order of restoring old & new tuples when de-spooling a change in a large transaction from disk. This happens to only take effect when a change is spooled to disk which has old/new versions of the tuple. That only is the case for UPDATEs where he primary key changed or where replica identity is changed to FULL. The tests didn't catch this because either spooled updates, or updates that changed primary keys, were tested; not both at the same time. Found while adding tests for the following commit. Backpatch: 9.4, where logical decoding was added
2016-03-05logical decoding: Tell reorderbuffer about all xids.Andres Freund
Logical decoding's reorderbuffer keeps transactions in an LSN ordered list for efficiency. To make that's efficiently possible upper-level xids are forced to be logged before nested subtransaction xids. That only works though if these records are all looked at: Unfortunately we didn't do so for e.g. row level locks, which are otherwise uninteresting for logical decoding. This could lead to errors like: "ERROR: subxact logged without previous toplevel record". It's not sufficient to just look at row locking records, the xid could appear first due to a lot of other types of records (which will trigger the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny). So invent infrastructure to tell reorderbuffer about xids seen, when they'd otherwise not pass through reorderbuffer.c. Reported-By: Jarred Ward Bug: #13844 Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org Backpatch: 9.4, where logical decoding was added
2016-03-02logical decoding: fix decoding of a commit's commit time.Andres Freund
When adding replication origins in 5aa235042, I somehow managed to set the timestamp of decoded transactions to InvalidXLogRecptr when decoding one made without a replication origin. Fix that, and the wrong type of the new commit_time variable. This didn't trigger a regression test failure because we explicitly don't show commit timestamps in the regression tests, as they obviously are variable. Add a test that checks that a decoded commit's timestamp is within minutes of NOW() from before the commit. Reported-By: Weiping Qu Diagnosed-By: Artur Zakirov Discussion: 56D4197E.9050706@informatik.uni-kl.de, 56D42918.1010108@postgrespro.ru Backpatch: 9.5, where 5aa235042 originates.
2016-02-29Fix typosAlvaro Herrera
Author: Amit Langote
2016-01-09Clean up some lack-of-STRICT issues in the core code, too.Tom Lane
A scan for missed proisstrict markings in the core code turned up these functions: brin_summarize_new_values pg_stat_reset_single_table_counters pg_stat_reset_single_function_counters pg_create_logical_replication_slot pg_create_physical_replication_slot pg_drop_replication_slot The first three of these take OID, so a null argument will normally look like a zero to them, resulting in "ERROR: could not open relation with OID 0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX functions. The other three will dump core on a null argument, though this is mitigated by the fact that they won't do so until after checking that the caller is superuser or has rolreplication privilege. In addition, the pg_logical_slot_get/peek[_binary]_changes family was intentionally marked nonstrict, but failed to make nullness checks on all the arguments; so again a null-pointer-dereference crash is possible but only for superusers and rolreplication users. Add the missing ARGISNULL checks to the latter functions, and mark the former functions as strict in pg_proc. Make that change in the back branches too, even though we can't force initdb there, just so that installations initdb'd in future won't have the issue. Since none of these bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX functions hardly misbehave at all), it seems sufficient to do this. In addition, fix some order-of-operations oddities in the slot_get_changes family, mostly cosmetic, but not the part that moves the function's last few operations into the PG_TRY block. As it stood, there was significant risk for an error to exit without clearing historical information from the system caches. The slot_get_changes bugs go back to 9.4 where that code was introduced. Back-patch appropriate subsets of the pg_proc changes into all active branches, as well.
2015-12-18Fix copy-and-paste error in logical decoding callback.Robert Haas
This could result in the error context misidentifying where the error actually occurred. Craig Ringer
2015-12-13Consistently set all fields in pg_stat_replication to null instead of 0Magnus Hagander
Previously the "sent" field would be set to 0 and all other xlog pointers be set to NULL if there were no valid values (such as when in a backup sending walsender).
2015-12-13Properly initialize write, flush and replay locations in walsender slotsMagnus Hagander
These would leak random xlog positions if a walsender used for backup would a walsender slot previously used by a replication walsender. In passing also fix a couple of cases where the xlog pointer is directly compared to zero instead of using XLogRecPtrIsInvalid, noted by Michael Paquier.
2015-11-21Adopt the GNU convention for handling tar-archive members exceeding 8GB.Tom Lane
The POSIX standard for tar headers requires archive member sizes to be printed in octal with at most 11 digits, limiting the representable file size to 8GB. However, GNU tar and apparently most other modern tars support a convention in which oversized values can be stored in base-256, allowing any practical file to be a tar member. Adopt this convention to remove two limitations: * pg_dump with -Ft output format failed if the contents of any one table exceeded 8GB. * pg_basebackup failed if the data directory contained any file exceeding 8GB. (This would be a fatal problem for installations configured with a table segment size of 8GB or more, and it has also been seen to fail when large core dump files exist in the data directory.) File sizes under 8GB are still printed in octal, so that no compatibility issues are created except in cases that would have failed entirely before. In addition, this patch fixes several bugs in the same area: * In 9.3 and later, we'd defined tarCreateHeader's file-size argument as size_t, which meant that on 32-bit machines it would write a corrupt tar header for file sizes between 4GB and 8GB, even though no error was raised. This broke both "pg_dump -Ft" and pg_basebackup for such cases. * pg_restore from a tar archive would fail on tables of size between 4GB and 8GB, on machines where either "size_t" or "unsigned long" is 32 bits. This happened even with an archive file not affected by the previous bug. * pg_basebackup would fail if there were files of size between 4GB and 8GB, even on 64-bit machines. * In 9.3 and later, "pg_basebackup -Ft" failed entirely, for any file size, on 64-bit big-endian machines. In view of these potential data-loss bugs, back-patch to all supported branches, even though removal of the documented 8GB limit might otherwise be considered a new feature rather than a bug fix.
2015-11-09Set replication origin when decoding commit records.Andres Freund
By accident the replication origin was not set properly in DecodeCommit(). That's bad because the origin is passed to the output plugins origin filter, and accessible from the output plugin via ReorderBufferTXN->origin_id. Accessing the origin of individual changes worked before the fix, which is why this wasn't notices earlier. Reported-By: Craig Ringer Author: Craig Ringer Discussion: CAMsr+YFhBJLp=qfSz3-J+0P1zLkE8zNXM2otycn20QRMx380gw@mail.gmail.com Backpatch: 9.5, where replication origins where introduced
2015-10-28Message style improvementsPeter Eisentraut
Message style, plurals, quoting, spelling, consistency with similar messages
2015-10-27Measure string lengths only onceAlvaro Herrera
Bernd Helmle complained that CreateReplicationSlot() was assigning the same value to the same variable twice, so we could remove one of them. Code inspection reveals that we can actually remove both assignments: according to the author the assignment was there for beauty of the strlen line only, and another possible fix to that is to put the strlen in its own line, so do that. To be consistent within the file, refactor all duplicated strlen() calls, which is what we do elsewhere in the backend anyway. In basebackup.c, snprintf already returns the right length; no need for strlen afterwards. Backpatch to 9.4, where replication slots were introduced, to keep code identical. Some of this is older, but the patch doesn't apply cleanly and it's only of cosmetic value anyway. Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan
2015-10-03Improve errhint() about replication slot naming restrictions.Andres Freund
The existing hint talked about "may only contain letters", but the actual requirement is more strict: only lower case letters are allowed. Reported-By: Rushabh Lathia Author: Rushabh Lathia Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com Backpatch: 9.4-, where replication slots were added
2015-09-28Fix "sesssion" typoAlvaro Herrera
It was introduced alongside replication origins, by commit 5aa2350426c, so backpatch to 9.5. Pointed out by Fujii Masao
2015-09-05Fix misc typos.Heikki Linnakangas
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-08-11Minor cleanups in slot related code.Andres Freund
Fix a bunch of typos, and remove two superflous includes. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com Backpatch: 9.4
2015-08-10Fix copy & paste mistake in pg_get_replication_slots().Andres Freund
XLogRecPtr was compared with InvalidTransactionId instead of InvalidXLogRecPtr. As both are defined to the same value this doesn't cause any actual problems, but it's still wrong. Backpatch: 9.4-master, bug was introduced in 9.4
2015-08-07Address points made in post-commit review of replication origins.Andres Freund
Amit reviewed the replication origins patch and made some good points. Address them. This fixes typos in error messages, docs and comments and adds a missing error check (although in a should-never-happen scenario). Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com Backpatch: 9.5, where replication origins were introduced.
2015-08-05Fix debug message output when connecting to a logical slot.Andres Freund
Previously the message erroneously printed the same LSN twice as the assignment to the start_lsn variable was before the message. Correct that. Reported-By: Marko Tiikkaja Author: Marko Tiikkaja Backpatch: 9.5, where logical decoding was introduced
2015-07-07Fix logical decoding bug leading to inefficient reopening of files.Andres Freund
When spilling transaction data to disk a simple typo caused the output file to be closed and reopened for every serialized change. That happens to not have a huge impact on linux, which is why it probably wasn't noticed so far, but on windows that appears to trigger actual disk writes after every change. Not fun. The bug fortunately does not have any impact besides speed. A change could end up being in the wrong segment (last instead of next), but since we read all files to the end, that's just ugly, not really problematic. It's not a problem to upgrade, since transaction spill files do not persist across restarts. Bug: #13484 Reported-By: Olivier Gosseaume Discussion: 20150703090217.1190.63940@wrigleys.postgresql.org Backpatch to 9.4, where logical decoding was added.
2015-06-10Fix typoPeter Eisentraut
2015-05-28Fix assorted inconsistencies in our calls of readlink().Tom Lane
Ensure that we null-terminate the result string (one place in pg_rewind). Be paranoid about out-of-range results from readlink() (should not happen, but there is no good reason for some call sites to be careful about it and others not). Consistently use the whole buffer, not sometimes one byte less. Ensure we emit an appropriate errcode() in all cases. Spell the error messages the same way. The only serious bug here is the missing null-termination in pg_rewind, which is new code, so no need for a back-patch. Abhijit Menon-Sen and Tom Lane
2015-05-23pgindent run for 9.5Bruce Momjian
2015-05-20Fix more typos in comments.Heikki Linnakangas
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20Collection of typo fixes.Heikki Linnakangas
Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
2015-05-17Fix typos in commentsMagnus Hagander
Dmitriy Olshevskiy
2015-05-15Add archive_mode='always' option.Heikki Linnakangas
In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.
2015-05-12Map basebackup tablespaces using a tablespace_map fileAndrew Dunstan
Windows can't reliably restore symbolic links from a tar format, so instead during backup start we create a tablespace_map file, which is used by the restoring postgres to create the correct links in pg_tblspc. The backup protocol also now has an option to request this file to be included in the backup stream, and this is used by pg_basebackup when operating in tar mode. This is done on all platforms, not just Windows. This means that pg_basebackup will not not work in tar mode against 9.4 and older servers, as this protocol option isn't implemented there. Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.
2015-05-08Add macros to check if a filename is a WAL segment or other such file.Heikki Linnakangas
We had many instances of the strlen + strspn combination to check for that. This makes the code a bit easier to read.
2015-05-08Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.Andres Freund
The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
2015-05-04Fix typosPeter Eisentraut
Author: Erik Rijkers <er@xs4all.nl>
2015-05-01Copy editing of the replication origins patch.Andres Freund
Michael Paquier and myself.
2015-04-30Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.Andres Freund
We can't rely on UINT16_MAX being present, which is why we introduced PG_UINT16_MAX... Buildfarm animal bowerbird via Andrew Gierth.
2015-04-29Introduce replication progress tracking infrastructure.Andres Freund
When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-28Use a fd opened for read/write when syncing slots during startup.Andres Freund
Some operating systems, including the reporter's windows, return EBADFD or similar when fsync() is invoked on a O_RDONLY file descriptor. Unfortunately RestoreSlotFromDisk() does exactly that; which causes failures after restarts in at least some scenarios. If you hit the bug the error message will be something like ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor Simply use O_RDWR instead of O_RDONLY when opening the relevant file descriptor to fix the bug. Unfortunately I have no way of verifying the fix, but we've seen similar problems in the past. This bug goes back to 9.4 where slots were introduced. Backpatch accordingly. Reported-By: Patrice Drolet Bug: #13143: Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
2015-04-26Fix various typos and grammar errors in comments.Andres Freund
Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
2015-04-21Add 'active_in' column to pg_replication_slots.Andres Freund
Right now it is visible whether a replication slot is active in any session, but not in which. Adding the active_in column, containing the pid of the backend having acquired the slot, makes it much easier to associate pg_replication_slots entries with the corresponding pg_stat_replication/pg_stat_activity row. This should have been done from the start, but I (Andres) dropped the ball there somehow. Author: Craig Ringer, revised by me Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com