diff options
| author | Tom Lane <tgl@sss.pgh.pa.us> | 2010-12-09 13:03:11 -0500 | 
|---|---|---|
| committer | Tom Lane <tgl@sss.pgh.pa.us> | 2010-12-09 13:03:42 -0500 | 
| commit | 2ffcb0cb6a5bf97de22f0ce58f55537ce1c87653 (patch) | |
| tree | 5ab04d596c0056a5a45f92e89a1c7480ec3767d6 /src/backend/access/heap/heapam.c | |
| parent | 87eadd7e3d6f5581d5b4cb8083212a323050e388 (diff) | |
Eliminate O(N^2) behavior in parallel restore with many blobs.
With hundreds of thousands of TOC entries, the repeated searches in
reduce_dependencies() become the dominant cost.  Get rid of that searching
by constructing reverse-dependency lists, which we can do in O(N) time
during the fix_dependencies() preprocessing.  I chose to store the reverse
dependencies as DumpId arrays for consistency with the forward-dependency
representation, and keep the previously-transient tocsByDumpId[] array
around to locate actual TOC entry structs quickly from dump IDs.
While this fixes the slow case reported by Vlad Arkhipov, there is still
a potential for O(N^2) behavior with sufficiently many tables:
fix_dependencies itself, as well as mark_create_done and
inhibit_data_for_failed_table, are doing repeated searches to deal with
table-to-table-data dependencies.  Possibly this work could be extended
to deal with that, although the latter two functions are also used in
non-parallel restore where we currently don't run fix_dependencies.
Another TODO is that we fail to parallelize restore of multiple blobs
at all.  This appears to require changes in the archive format to fix.
Back-patch to 9.0 where the problem was reported.  8.4 has potential issues
as well; but since it doesn't create a separate TOC entry for each blob,
it's at much less risk of having enough TOC entries to cause real problems.
Diffstat (limited to 'src/backend/access/heap/heapam.c')
0 files changed, 0 insertions, 0 deletions
