From 3079026fc17275d734757558c16fe8ae236fee04 Mon Sep 17 00:00:00 2001 From: Xing Xin Date: Wed, 19 Jun 2024 04:07:31 +0000 Subject: bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak Reviewed-by: Patrick Steinhardt Signed-off-by: Xing Xin Signed-off-by: Junio C Hamano --- bundle-uri.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'bundle-uri.c') diff --git a/bundle-uri.c b/bundle-uri.c index 91b3319a5c..65666a11d9 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file) refs_update_ref(get_main_ref_store(the_repository), "fetched bundle", bundle_ref.buf, oid, has_old ? &old_oid : NULL, - REF_SKIP_OID_VERIFICATION, - UPDATE_REFS_MSG_ON_ERR); + 0, UPDATE_REFS_MSG_ON_ERR); } bundle_header_release(&header); -- cgit v1.2.3 From 63d903ff52594eb52289abb89db1a4bca7b0f946 Mon Sep 17 00:00:00 2001 From: Xing Xin Date: Wed, 19 Jun 2024 04:07:33 +0000 Subject: unbundle: extend object verification for fetches The existing fetch.fsckObjects and transfer.fsckObjects configurations were not fully applied to bundle-involved fetches, including direct bundle fetches and bundle-uri enabled fetches. Furthermore, there was no object verification support for unbundle. This commit extends object verification support in `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When this option is enabled, we append the `--fsck-objects` flag to `git-index-pack`. The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches, where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable this option for `bundle.c:unbundle`, specifically in: - `transport.c:fetch_refs_from_bundle` for direct bundle fetches. - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches. This addition ensures a consistent logic for object verification during fetches. Tests have been added to confirm functionality in the scenarios mentioned above. Reviewed-by: Patrick Steinhardt Signed-off-by: Xing Xin Signed-off-by: Junio C Hamano --- bundle-uri.c | 3 ++- bundle.c | 3 +++ bundle.h | 1 + t/t5558-clone-bundle-uri.sh | 34 +++++++++++++++++++++++++++++++++- t/t5607-clone-bundle.sh | 35 +++++++++++++++++++++++++++++++++++ transport.c | 3 ++- 6 files changed, 76 insertions(+), 3 deletions(-) (limited to 'bundle-uri.c') diff --git a/bundle-uri.c b/bundle-uri.c index 65666a11d9..ed9b49fdbc 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -9,6 +9,7 @@ #include "hashmap.h" #include "pkt-line.h" #include "config.h" +#include "fetch-pack.h" #include "remote.h" static struct { @@ -373,7 +374,7 @@ static int unbundle_from_file(struct repository *r, const char *file) * the prerequisite commits. */ if ((result = unbundle(r, &header, bundle_fd, NULL, - VERIFY_BUNDLE_QUIET))) + VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0)))) return 1; /* diff --git a/bundle.c b/bundle.c index 95367c2d0a..f124a2a562 100644 --- a/bundle.c +++ b/bundle.c @@ -625,6 +625,9 @@ int unbundle(struct repository *r, struct bundle_header *header, if (header->filter.choice) strvec_push(&ip.args, "--promisor=from-bundle"); + if (flags & VERIFY_BUNDLE_FSCK) + strvec_push(&ip.args, "--fsck-objects"); + if (extra_index_pack_args) { strvec_pushv(&ip.args, extra_index_pack_args->v); strvec_clear(extra_index_pack_args); diff --git a/bundle.h b/bundle.h index 021adbdcbb..5ccc9a061a 100644 --- a/bundle.h +++ b/bundle.h @@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), VERIFY_BUNDLE_QUIET = (1 << 1), + VERIFY_BUNDLE_FSCK = (1 << 2), }; int verify_bundle(struct repository *r, struct bundle_header *header, diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index a0895913fe..cd05321e17 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -36,7 +36,22 @@ test_expect_success 'create bundle' ' sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \ bad-header.bundle && convert_bundle_to_pack \ - >bad-header.bundle + >bad-header.bundle && + + tree_b=$(git rev-parse B^{tree}) && + cat >data <<-EOF && + tree $tree_b + parent $commit_b + author A U Thor + committer A U Thor + + commit: this is a commit with bad emails + + EOF + bad_commit=$(git hash-object --literally -t commit -w --stdin refs && + grep "refs/bundles/" refs >actual && + test_write_lines refs/bundles/bad >expect && + test_cmp expect actual && + + # Unbundle fails with fsckObjects set true, but clone can still proceed. + git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \ + clone-from clone-bad-object-fsck 2>err && + test_grep "missingEmail" err && + git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs && + test_grep ! "refs/bundles/" refs +' + test_expect_success 'clone with path bundle and non-default hash' ' test_when_finished "rm -rf clone-path-non-default-hash" && GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \ diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh index 0d1e92d996..489c6570da 100755 --- a/t/t5607-clone-bundle.sh +++ b/t/t5607-clone-bundle.sh @@ -138,6 +138,41 @@ test_expect_success 'fetch SHA-1 from bundle' ' git fetch --no-tags foo/tip.bundle "$(cat hash)" ' +test_expect_success 'clone bundle with different fsckObjects configurations' ' + test_create_repo bundle-fsck && + ( + cd bundle-fsck && + test_commit A && + commit_a=$(git rev-parse A) && + tree_a=$(git rev-parse A^{tree}) && + cat >data <<-EOF && + tree $tree_a + parent $commit_a + author A U Thor + committer A U Thor + + commit: this is a commit with bad emails + + EOF + bad_commit=$(git hash-object --literally -t commit -w --stdin err && + test_grep "missingEmail" err && + + test_must_fail git -c transfer.fsckObjects=true \ + clone bundle-fsck/bad.bundle bundle-transfer-fsck 2>err && + test_grep "missingEmail" err +' + test_expect_success 'git bundle uses expected default format' ' git bundle create bundle HEAD^.. && cat >expect <<-EOF && diff --git a/transport.c b/transport.c index 83ddea8fbc..9e84784a7d 100644 --- a/transport.c +++ b/transport.c @@ -184,7 +184,8 @@ static int fetch_refs_from_bundle(struct transport *transport, if (!data->get_refs_from_bundle_called) get_refs_from_bundle_inner(transport); ret = unbundle(the_repository, &data->header, data->fd, - &extra_index_pack_args, 0); + &extra_index_pack_args, + fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0); transport->hash_algo = data->header.hash_algo; return ret; } -- cgit v1.2.3