summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2016-11-15 15:55:35 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2016-11-15 15:55:35 -0500
commitffaa44cb559db332baeee7d25dedd74a61974203 (patch)
treea088351217df2781110c3f201d58afa67aa5b007
parentfc19c1801bd2dbee1043b0c0b62e07747d30ea1c (diff)
Account for catalog snapshot in PGXACT->xmin updates.
The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting for whether MyPgXact->xmin could be cleared or advanced. In normal transactions this was masked by the fact that the transaction snapshot would be older, but during backend startup and certain utility commands it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had been cleared, meaning that recently-deleted rows could be pruned even though this snapshot could still see them, causing unexpected catalog lookup failures. This effect appears to be the explanation for a recent failure on buildfarm member piculet. To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever it is valid. In the previous logic, it was possible for the CatalogSnapshot to remain valid across waits for client input, but with this change that would mean it delays advance of global xmin in cases where it did not before. To avoid possibly causing new table-bloat problems with clients that sit idle for long intervals, add code to invalidate the CatalogSnapshot before waiting for client input. (When the backend is busy, it's unlikely that the CatalogSnapshot would be the oldest snap for very long, so we don't worry about forcing early invalidation of it otherwise.) In passing, remove the CatalogSnapshotStale flag in favor of using "CatalogSnapshot != NULL" to represent validity, as we do for the other special snapshots in snapmgr.c. And improve some obsolete comments. No regression test because I don't know a deterministic way to cause this failure. But the stress test shown in the original discussion provokes "cache lookup failed for relation 1255" within a few dozen seconds for me. Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's quite easy to produce similar failures with the same test case in branches before 9.4. But MVCC catalog scans were supposed to fix that.) Discussion: <16447.1478818294@sss.pgh.pa.us>
-rw-r--r--src/backend/tcop/postgres.c8
-rw-r--r--src/backend/utils/time/snapmgr.c122
-rw-r--r--src/include/utils/snapmgr.h1
3 files changed, 95 insertions, 36 deletions
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 599874e7435..3d9319d151e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3947,6 +3947,12 @@ PostgresMain(int argc, char *argv[],
initStringInfo(&input_message);
/*
+ * Also consider releasing our catalog snapshot if any, so that it's
+ * not preventing advance of global xmin while we wait for the client.
+ */
+ InvalidateCatalogSnapshotConditionally();
+
+ /*
* (1) If we've reached idle state, tell the frontend we're ready for
* a new query.
*
@@ -4422,7 +4428,7 @@ ShowUsage(const char *title)
appendStringInfoString(&str, "! system usage stats:\n");
appendStringInfo(&str,
- "!\t%ld.%06ld s user, %ld.%06ld s system, %ld.%06ld s elapsed\n",
+ "!\t%ld.%06ld s user, %ld.%06ld s system, %ld.%06ld s elapsed\n",
(long) (r.ru_utime.tv_sec - Save_r.ru_utime.tv_sec),
(long) (r.ru_utime.tv_usec - Save_r.ru_utime.tv_usec),
(long) (r.ru_stime.tv_sec - Save_r.ru_stime.tv_sec),
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 1ec9f70f0ee..6cf3829e22a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1,4 +1,5 @@
/*-------------------------------------------------------------------------
+ *
* snapmgr.c
* PostgreSQL snapshot manager
*
@@ -8,7 +9,7 @@
* (tracked by separate refcounts on each snapshot), its memory can be freed.
*
* The FirstXactSnapshot, if any, is treated a bit specially: we increment its
- * regd_count and count it in RegisteredSnapshots, but this reference is not
+ * regd_count and list it in RegisteredSnapshots, but this reference is not
* tracked by a resource owner. We used to use the TopTransactionResourceOwner
* to track this snapshot reference, but that introduces logical circularity
* and thus makes it impossible to clean up in a sane fashion. It's better to
@@ -16,19 +17,22 @@
* module is entirely lower-level than ResourceOwners.
*
* Likewise, any snapshots that have been exported by pg_export_snapshot
- * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
+ * have regd_count = 1 and are listed in RegisteredSnapshots, but are not
* tracked by any resource owner.
*
+ * Likewise, the CatalogSnapshot is listed in RegisteredSnapshots when it
+ * is valid, but is not tracked by any resource owner.
+ *
* The same is true for historic snapshots used during logical decoding,
- * their lifetime is managed separately (as they live longer as one xact.c
+ * their lifetime is managed separately (as they live longer than one xact.c
* transaction).
*
* These arrangements let us reset MyPgXact->xmin when there are no snapshots
- * referenced by this transaction. (One possible improvement would be to be
- * able to advance Xmin when the snapshot with the earliest Xmin is no longer
- * referenced. That's a bit harder though, it requires more locking, and
- * anyway it should be rather uncommon to keep temporary snapshots referenced
- * for too long.)
+ * referenced by this transaction, and advance it when the one with oldest
+ * Xmin is no longer referenced. For simplicity however, only registered
+ * snapshots not active snapshots participate in tracking which one is oldest;
+ * we don't try to change MyPgXact->xmin except when the active-snapshot
+ * stack is empty.
*
*
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
@@ -131,7 +135,7 @@ static volatile OldSnapshotControlData *oldSnapshotControl;
* SecondarySnapshot is a snapshot that's always up-to-date as of the current
* instant, even in transaction-snapshot mode. It should only be used for
* special-purpose code (say, RI checking.) CatalogSnapshot points to an
- * MVCC snapshot intended to be used for catalog scans; we must refresh it
+ * MVCC snapshot intended to be used for catalog scans; we must invalidate it
* whenever a system catalog change occurs.
*
* These SnapshotData structs are static to simplify memory allocation
@@ -148,11 +152,6 @@ static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL;
/*
- * Staleness detection for CatalogSnapshot.
- */
-static bool CatalogSnapshotStale = true;
-
-/*
* These are updated by GetSnapshotData. We initialize them this way
* for the convenience of TransactionIdIsInProgress: even in bootstrap
* mode, we don't want it to say that BootstrapTransactionId is in progress.
@@ -193,8 +192,7 @@ static ActiveSnapshotElt *OldestActiveSnapshot = NULL;
/*
* Currently registered Snapshots. Ordered in a heap by xmin, so that we can
- * quickly find the one with lowest xmin, to advance our MyPgXat->xmin.
- * resowner.c also tracks these.
+ * quickly find the one with lowest xmin, to advance our MyPgXact->xmin.
*/
static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b,
void *arg);
@@ -316,6 +314,12 @@ GetTransactionSnapshot(void)
/* First call in transaction? */
if (!FirstSnapshotSet)
{
+ /*
+ * Don't allow catalog snapshot to be older than xact snapshot. Must
+ * do this first to allow the empty-heap Assert to succeed.
+ */
+ InvalidateCatalogSnapshot();
+
Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL);
@@ -347,9 +351,6 @@ GetTransactionSnapshot(void)
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
- /* Don't allow catalog snapshot to be older than xact snapshot. */
- CatalogSnapshotStale = true;
-
FirstSnapshotSet = true;
return CurrentSnapshot;
}
@@ -358,7 +359,7 @@ GetTransactionSnapshot(void)
return CurrentSnapshot;
/* Don't allow catalog snapshot to be older than xact snapshot. */
- CatalogSnapshotStale = true;
+ InvalidateCatalogSnapshot();
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
@@ -463,36 +464,72 @@ GetNonHistoricCatalogSnapshot(Oid relid)
* scan a relation for which neither catcache nor snapshot invalidations
* are sent, we must refresh the snapshot every time.
*/
- if (!CatalogSnapshotStale && !RelationInvalidatesSnapshotsOnly(relid) &&
+ if (CatalogSnapshot &&
+ !RelationInvalidatesSnapshotsOnly(relid) &&
!RelationHasSysCache(relid))
- CatalogSnapshotStale = true;
+ InvalidateCatalogSnapshot();
- if (CatalogSnapshotStale)
+ if (CatalogSnapshot == NULL)
{
/* Get new snapshot. */
CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
/*
- * Mark new snapshost as valid. We must do this last, in case an
- * ERROR occurs inside GetSnapshotData().
+ * Make sure the catalog snapshot will be accounted for in decisions
+ * about advancing PGXACT->xmin. We could apply RegisterSnapshot, but
+ * that would result in making a physical copy, which is overkill; and
+ * it would also create a dependency on some resource owner, which we
+ * do not want for reasons explained at the head of this file. Instead
+ * just shove the CatalogSnapshot into the pairing heap manually. This
+ * has to be reversed in InvalidateCatalogSnapshot, of course.
+ *
+ * NB: it had better be impossible for this to throw error, since the
+ * CatalogSnapshot pointer is already valid.
*/
- CatalogSnapshotStale = false;
+ pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
}
return CatalogSnapshot;
}
/*
- * Mark the current catalog snapshot as invalid. We could change this API
- * to allow the caller to provide more fine-grained invalidation details, so
- * that a change to relation A wouldn't prevent us from using our cached
- * snapshot to scan relation B, but so far there's no evidence that the CPU
- * cycles we spent tracking such fine details would be well-spent.
+ * InvalidateCatalogSnapshot
+ * Mark the current catalog snapshot, if any, as invalid
+ *
+ * We could change this API to allow the caller to provide more fine-grained
+ * invalidation details, so that a change to relation A wouldn't prevent us
+ * from using our cached snapshot to scan relation B, but so far there's no
+ * evidence that the CPU cycles we spent tracking such fine details would be
+ * well-spent.
*/
void
InvalidateCatalogSnapshot(void)
{
- CatalogSnapshotStale = true;
+ if (CatalogSnapshot)
+ {
+ pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
+ CatalogSnapshot = NULL;
+ SnapshotResetXmin();
+ }
+}
+
+/*
+ * InvalidateCatalogSnapshotConditionally
+ * Drop catalog snapshot if it's the only one we have
+ *
+ * This is called when we are about to wait for client input, so we don't
+ * want to continue holding the catalog snapshot if it might mean that the
+ * global xmin horizon can't advance. However, if there are other snapshots
+ * still active or registered, the catalog snapshot isn't likely to be the
+ * oldest one, so we might as well keep it.
+ */
+void
+InvalidateCatalogSnapshotConditionally(void)
+{
+ if (CatalogSnapshot &&
+ ActiveSnapshot == NULL &&
+ pairingheap_is_singular(&RegisteredSnapshots))
+ InvalidateCatalogSnapshot();
}
/*
@@ -509,6 +546,7 @@ SnapshotSetCommandId(CommandId curcid)
CurrentSnapshot->curcid = curcid;
if (SecondarySnapshot)
SecondarySnapshot->curcid = curcid;
+ /* Should we do the same with CatalogSnapshot? */
}
/*
@@ -526,6 +564,9 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
/* Caller should have checked this already */
Assert(!FirstSnapshotSet);
+ /* Better do this to ensure following Assert succeeds. */
+ InvalidateCatalogSnapshot();
+
Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL);
Assert(!HistoricSnapshotActive());
@@ -918,7 +959,15 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
* Even if there are some remaining snapshots, we may be able to advance our
* PGXACT->xmin to some degree. This typically happens when a portal is
* dropped. For efficiency, we only consider recomputing PGXACT->xmin when
- * the active snapshot stack is empty.
+ * the active snapshot stack is empty; this allows us not to need to track
+ * which active snapshot is oldest.
+ *
+ * Note: it's tempting to use GetOldestSnapshot() here so that we can include
+ * active snapshots in the calculation. However, that compares by LSN not
+ * xmin so it's not entirely clear that it's the same thing. Also, we'd be
+ * critically dependent on the assumption that the bottommost active snapshot
+ * stack entry has the oldest xmin. (Current uses of GetOldestSnapshot() are
+ * not actually critical, but this would be.)
*/
static void
SnapshotResetXmin(void)
@@ -1006,7 +1055,7 @@ AtEOXact_Snapshot(bool isCommit)
{
/*
* In transaction-snapshot mode we must release our privately-managed
- * reference to the transaction snapshot. We must decrement
+ * reference to the transaction snapshot. We must remove it from
* RegisteredSnapshots to keep the check below happy. But we don't bother
* to do FreeSnapshot, for two reasons: the memory will go away with
* TopTransactionContext anyway, and if someone has left the snapshot
@@ -1046,7 +1095,7 @@ AtEOXact_Snapshot(bool isCommit)
/*
* As with the FirstXactSnapshot, we needn't spend any effort on
* cleaning up the per-snapshot data structures, but we do need to
- * unlink them from RegisteredSnapshots to prevent a warning below.
+ * remove them from RegisteredSnapshots to prevent a warning below.
*/
foreach(lc, exportedSnapshots)
{
@@ -1058,6 +1107,9 @@ AtEOXact_Snapshot(bool isCommit)
exportedSnapshots = NIL;
}
+ /* Drop catalog snapshot if any */
+ InvalidateCatalogSnapshot();
+
/* On commit, complain about leftover snapshots */
if (isCommit)
{
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9e3827249e2..512e2b6361d 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -69,6 +69,7 @@ extern Snapshot GetOldestSnapshot(void);
extern Snapshot GetCatalogSnapshot(Oid relid);
extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
extern void InvalidateCatalogSnapshot(void);
+extern void InvalidateCatalogSnapshotConditionally(void);
extern void PushActiveSnapshot(Snapshot snapshot);
extern void PushCopiedSnapshot(Snapshot snapshot);