From a8dd7e05b1c033917b900ea3d930e79eea3ff9a4 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 12 Apr 2023 18:20:33 -0400 Subject: config: enable `pack.writeReverseIndex` by default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout /dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout /dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout /dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout /dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout /dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout /dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout /dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout /dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout /dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout /dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" Acked-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/index-pack.c | 1 + 1 file changed, 1 insertion(+) (limited to 'builtin/index-pack.c') diff --git a/builtin/index-pack.c b/builtin/index-pack.c index b17e79cd40..323c063f9d 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1753,6 +1753,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) fsck_options.walk = mark_link; reset_pack_idx_option(&opts); + opts.flags |= WRITE_REV; git_config(git_index_pack_config, &opts); if (prefix && chdir(prefix)) die(_("Cannot come back to cwd")); -- cgit v1.2.3 From 9f7f10a282d8adeb9da0990aa0eb2adf93a47ca7 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Wed, 12 Apr 2023 18:20:36 -0400 Subject: t: invert `GIT_TEST_WRITE_REV_INDEX` Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we added a test knob to conditionally enable writing a ".rev" file when indexing a pack. At the time, this was used to ensure that the test suite worked even when ".rev" files were written, which served as a stress-test for the on-disk reverse index implementation. Now that reading from on-disk ".rev" files is enabled by default, the test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning. We could get rid of the option entirely, but there would be no convenient way to test Git when ".rev" files *aren't* in place. Instead of getting rid of the option, invert its meaning to instead disable writing ".rev" files, thereby running the test suite in a mode where the reverse index is generated from scratch. This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some spelling of "true", we are still running and exercising Git's behavior when forced to generate reverse indexes from scratch. Do so by setting it in the linux-TEST-vars CI run to ensure that we are maintaining good coverage of this now-legacy code. Signed-off-by: Taylor Blau Acked-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/index-pack.c | 4 ++-- builtin/pack-objects.c | 4 ++-- ci/run-build-and-tests.sh | 2 +- pack-revindex.h | 2 +- t/README | 2 +- t/t5325-reverse-index.sh | 2 +- 6 files changed, 8 insertions(+), 8 deletions(-) (limited to 'builtin/index-pack.c') diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 323c063f9d..9e36c985cf 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1758,8 +1758,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) if (prefix && chdir(prefix)) die(_("Cannot come back to cwd")); - if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0)) - rev_index = 1; + if (git_env_bool(GIT_TEST_NO_WRITE_REV_INDEX, 0)) + rev_index = 0; else rev_index = !!(opts.flags & (WRITE_REV_VERIFY | WRITE_REV)); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index dbaa04482f..1797871ce9 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4295,8 +4295,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(&pack_idx_opts); pack_idx_opts.flags |= WRITE_REV; git_config(git_pack_config, NULL); - if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0)) - pack_idx_opts.flags |= WRITE_REV; + if (git_env_bool(GIT_TEST_NO_WRITE_REV_INDEX, 0)) + pack_idx_opts.flags &= ~WRITE_REV; progress = isatty(2); argc = parse_options(argc, argv, prefix, pack_objects_options, diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index b098e10f52..a18b13a41d 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -27,7 +27,7 @@ linux-TEST-vars) export GIT_TEST_MULTI_PACK_INDEX=1 export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master - export GIT_TEST_WRITE_REV_INDEX=1 + export GIT_TEST_NO_WRITE_REV_INDEX=1 export GIT_TEST_CHECKOUT_WORKERS=2 ;; linux-clang) diff --git a/pack-revindex.h b/pack-revindex.h index ef8afee88b..46e834064e 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -34,7 +34,7 @@ #define RIDX_SIGNATURE 0x52494458 /* "RIDX" */ #define RIDX_VERSION 1 -#define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX" +#define GIT_TEST_NO_WRITE_REV_INDEX "GIT_TEST_NO_WRITE_REV_INDEX" #define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY" #define GIT_TEST_REV_INDEX_DIE_ON_DISK "GIT_TEST_REV_INDEX_DIE_ON_DISK" diff --git a/t/README b/t/README index 29576c3748..bdfac4cceb 100644 --- a/t/README +++ b/t/README @@ -475,7 +475,7 @@ GIT_TEST_DEFAULT_HASH= specifies which hash algorithm to use in the test scripts. Recognized values for are "sha1" and "sha256". -GIT_TEST_WRITE_REV_INDEX=, when true enables the +GIT_TEST_NO_WRITE_REV_INDEX=, when true disables the 'pack.writeReverseIndex' setting. GIT_TEST_SPARSE_INDEX=, when true enables index writes to use the diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 149dcf5193..0548fce1aa 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -7,7 +7,7 @@ TEST_PASSES_SANITIZE_LEAK=true # The below tests want control over the 'pack.writeReverseIndex' setting # themselves to assert various combinations of it with other options. -sane_unset GIT_TEST_WRITE_REV_INDEX +sane_unset GIT_TEST_NO_WRITE_REV_INDEX packdir=.git/objects/pack -- cgit v1.2.3