diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2019-05-02 19:11:28 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2019-05-02 19:11:28 -0400 |
commit | f912d7dec29341d55315fccef8dc3fdfd068c6e3 (patch) | |
tree | 10e81ced959944de81fbebbe0bd8a3b83d29146d /src/backend/utils/cache | |
parent | 2bf372a4ae26337ee61cf7a82081cf4bed05182d (diff) |
Fix reindexing of pg_class indexes some more.
Commits 3dbb317d3 et al failed under CLOBBER_CACHE_ALWAYS testing.
Investigation showed that to reindex pg_class_oid_index, we must
suppress accesses to the index (via SetReindexProcessing) before we call
RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement
therein; otherwise, relcache reloads happening within the CCI may try to
fetch pg_class rows using the index's new relfilenode value, which is as
yet an empty file.
Of course, the point of 3dbb317d3 was that that ordering didn't work
either, because then RelationSetNewRelfilenode's own update of the index's
pg_class row cannot access the index, should it need to.
There are various ways we might have got around that, but Andres Freund
came up with a brilliant solution: for a mapped index, we can really just
skip the pg_class update altogether. The only fields it was actually
changing were relpages etc, but it was just setting them to zeroes which
is useless make-work. (Correct new values will be installed at the end
of index build.) All pg_class indexes are mapped and probably always will
be, so this eliminates the problem by removing work rather than adding it,
always a pleasant outcome. Having taught RelationSetNewRelfilenode to do
it that way, we can revert the code reordering in reindex_index. (But
I left the moved setup code where it was; there seems no reason why it
has to run without use of the old index. If you're trying to fix a
busted pg_class index, you'll have had to disable system index use
altogether to get this far.)
Moreover, this means we don't need RelationSetIndexList at all, because
reindex_relation's hacking to make "REINDEX TABLE pg_class" work is
likewise now unnecessary. We'll leave that code in place in the back
branches, but a follow-on patch will remove it in HEAD.
In passing, do some minor cleanup for commit 5c1560606 (in HEAD only),
notably removing a duplicate newrnode assignment.
Patch by me, using a core idea due to Andres Freund. Back-patch to all
supported branches, as 3dbb317d3 was.
Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
Diffstat (limited to 'src/backend/utils/cache')
-rw-r--r-- | src/backend/utils/cache/relcache.c | 89 |
1 files changed, 57 insertions, 32 deletions
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 90ff8ccf54f..f2ca6493425 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -3421,7 +3421,8 @@ RelationBuildLocalRelation(const char *relname, /* * RelationSetNewRelfilenode * - * Assign a new relfilenode (physical file name) to the relation. + * Assign a new relfilenode (physical file name), and possibly a new + * persistence setting, to the relation. * * This allows a full rewrite of the relation to be done with transactional * safety (since the filenode assignment can be rolled back). Note however @@ -3463,13 +3464,10 @@ RelationSetNewRelfilenode(Relation relation, char persistence) */ RelationDropStorage(relation); - /* initialize new relfilenode from old relfilenode */ - newrnode = relation->rd_node; - /* - * Create storage for the main fork of the new relfilenode. If it's - * table-like object, call into table AM to do so, which'll also create - * the table's init fork. + * Create storage for the main fork of the new relfilenode. If it's a + * table-like object, call into the table AM to do so, which'll also + * create the table's init fork if needed. * * NOTE: If relevant for the AM, any conflict in relfilenode value will be * caught here, if GetNewRelFileNode messes up for any reason. @@ -3479,18 +3477,10 @@ RelationSetNewRelfilenode(Relation relation, char persistence) switch (relation->rd_rel->relkind) { - /* shouldn't be called for these */ - case RELKIND_VIEW: - case RELKIND_COMPOSITE_TYPE: - case RELKIND_FOREIGN_TABLE: - case RELKIND_PARTITIONED_TABLE: - case RELKIND_PARTITIONED_INDEX: - elog(ERROR, "should not have storage"); - break; - case RELKIND_INDEX: case RELKIND_SEQUENCE: { + /* handle these directly, at least for now */ SMgrRelation srel; srel = RelationCreateStorage(newrnode, persistence); @@ -3505,41 +3495,76 @@ RelationSetNewRelfilenode(Relation relation, char persistence) persistence, &freezeXid, &minmulti); break; + + default: + /* we shouldn't be called for anything else */ + elog(ERROR, "relation \"%s\" does not have storage", + RelationGetRelationName(relation)); + break; } /* - * However, if we're dealing with a mapped index, pg_class.relfilenode - * doesn't change; instead we have to send the update to the relation - * mapper. + * If we're dealing with a mapped index, pg_class.relfilenode doesn't + * change; instead we have to send the update to the relation mapper. + * + * For mapped indexes, we don't actually change the pg_class entry at all; + * this is essential when reindexing pg_class itself. That leaves us with + * possibly-inaccurate values of relpages etc, but those will be fixed up + * later. */ if (RelationIsMapped(relation)) + { + /* This case is only supported for indexes */ + Assert(relation->rd_rel->relkind == RELKIND_INDEX); + + /* Since we're not updating pg_class, these had better not change */ + Assert(classform->relfrozenxid == freezeXid); + Assert(classform->relminmxid == minmulti); + Assert(classform->relpersistence == persistence); + + /* + * In some code paths it's possible that the tuple update we'd + * otherwise do here is the only thing that would assign an XID for + * the current transaction. However, we must have an XID to delete + * files, so make sure one is assigned. + */ + (void) GetCurrentTransactionId(); + + /* Do the deed */ RelationMapUpdateMap(RelationGetRelid(relation), newrelfilenode, relation->rd_rel->relisshared, false); + + /* Since we're not updating pg_class, must trigger inval manually */ + CacheInvalidateRelcache(relation); + } else + { + /* Normal case, update the pg_class entry */ classform->relfilenode = newrelfilenode; - /* These changes are safe even for a mapped relation */ - if (relation->rd_rel->relkind != RELKIND_SEQUENCE) - { - classform->relpages = 0; /* it's empty until further notice */ - classform->reltuples = 0; - classform->relallvisible = 0; - } - classform->relfrozenxid = freezeXid; - classform->relminmxid = minmulti; - classform->relpersistence = persistence; + /* relpages etc. never change for sequences */ + if (relation->rd_rel->relkind != RELKIND_SEQUENCE) + { + classform->relpages = 0; /* it's empty until further notice */ + classform->reltuples = 0; + classform->relallvisible = 0; + } + classform->relfrozenxid = freezeXid; + classform->relminmxid = minmulti; + classform->relpersistence = persistence; - CatalogTupleUpdate(pg_class, &tuple->t_self, tuple); + CatalogTupleUpdate(pg_class, &tuple->t_self, tuple); + } heap_freetuple(tuple); table_close(pg_class, RowExclusiveLock); /* - * Make the pg_class row change visible, as well as the relation map - * change if any. This will cause the relcache entry to get updated, too. + * Make the pg_class row change or relation map change visible. This will + * cause the relcache entry to get updated, too. */ CommandCounterIncrement(); |