summaryrefslogtreecommitdiff
path: root/refs.c
AgeCommit message (Collapse)Author
31 hoursMerge branch 'ps/config-wo-the-repository'Junio C Hamano
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
45 hoursMerge branch 'kn/for-each-ref-skip'Junio C Hamano
"git for-each-ref" learns "--start-after" option to help applications that want to page its output. * kn/for-each-ref-skip: ref-cache: set prefix_state when seeking for-each-ref: introduce a '--start-after' option ref-filter: remove unnecessary else clause refs: selectively set prefix in the seek functions ref-cache: remove unused function 'find_ref_entry()' refs: expose `ref_iterator` via 'refs.h'
13 daysconfig: drop `git_config_get_int()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16Merge branch 'ph/fetch-prune-optim'Junio C Hamano
"git fetch --prune" used to be O(n^2) expensive when there are many refs, which has been corrected. * ph/fetch-prune-optim: clean up interface for refs_warn_dangling_symrefs refs: remove old refs_warn_dangling_symref fetch-prune: optimize dangling-ref reporting
2025-07-15Merge branch 'ps/object-store'Junio C Hamano
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-15refs: selectively set prefix in the seek functionsKarthik Nayak
The ref iterator exposes a `ref_iterator_seek()` function. The name suggests that this would seek the iterator to a specific reference in some ways similar to how `fseek()` works for the filesystem. However, the function actually sets the prefix for refs iteration. So further iteration would only yield references which match the particular prefix. This is a bit confusing. Let's add a 'flags' field to the function, which when set with the 'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the iteration in-line with the existing behavior. Otherwise, the reference backends will simply seek to the specified reference and clears any previously set prefix. This allows users to start iteration from a specific reference. In the packed and reftable backend, since references are available in a sorted list, the changes are simply setting the prefix if needed. The changes on the files-backend are a little more involved, since the files backend uses the 'ref-cache' mechanism. We move out the existing logic within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()` which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We then parse the provided seek string and set the required levels and their indexes to ensure that seeking is possible. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01clean up interface for refs_warn_dangling_symrefsPhil Hord
The refs_warn_dangling_symrefs interface is a bit fragile as it passes in printf-formatting strings with expectations about the number of arguments. This patch series made it worse by adding a 2nd positional argument. But there are only two call sites, and they both use almost identical display options. Make this safer by moving the format strings into the function that uses them to make it easier to see when the arguments don't match. Pass a prefix string and a dry_run flag so the decision logic can be handled where needed. Signed-off-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01refs: remove old refs_warn_dangling_symrefPhil Hord
The dangling warning function that takes a single ref to search for is no longer used. Remove it. Signed-off-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01fetch-prune: optimize dangling-ref reportingPhil Hord
When pruning during `git fetch` we check each pruned ref against the ref_store one at a time to decide whether to report it as dangling. This causes every local ref to be scanned for each ref being pruned. If there are N refs in the repo and M refs being pruned, this code is O(M*N). However, `git remote prune` uses a very similar function that is only O(N*log(M)). Remove the wasteful ref scanning for each pruned ref and use the faster version already available in refs_warn_dangling_symrefs. Change the message to include the original refname since the message is no longer printed immediately after the line that did just print the refname. In a repo with 126,000 refs, where I was pruning 28,000 refs, this code made about 3.6 billion calls to strcmp and consumed 410 seconds of CPU. (Invariably in that time, my remote would timeout and the fetch would fail anyway.) After this change, the same operation completes in under a second. Signed-off-by: Phil Hord <phil.hord@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename `object_directory` to `odb_source`Patrick Steinhardt
The `object_directory` structure is used as an access point for a single object directory like ".git/objects". While the structure isn't yet fully self-contained, the intent is for it to eventually contain all information required to access objects in one specific location. While the name "object directory" is a good fit for now, this will change over time as we continue with the agenda to make pluggable object databases a thing. Eventually, objects may not be accessed via any kind of directory at all anymore, but they could instead be backed by any kind of durable storage mechanism. While it seems quite far-fetched for now, it is thinkable that eventually this might even be some form of a database, for example. As such, the current name of this structure will become worse over time as we evolve into the direction of pluggable ODBs. Immediate next steps will start to carve out proper self-contained object directories, which requires us to pass in these object directories as parameters. Based on our modern naming schema this means that those functions should then be named after their subsystem, which means that we would start to bake the current name into the codebase more and more. Let's preempt this by renaming the structure. There have been a couple alternatives that were discussed: - `odb_backend` was discarded because it led to the association that one object database has a single backend, but the model is that one alternate has one backend. Furthermore, "backend" is more about the actual backing implementation and less about the high-level concept. - `odb_alternate` was discarded because it is a bit of a stretch to also call the main object directory an "alternate". Instead, pick `odb_source` as the new name. It makes it sufficiently clear that there can be multiple sources and does not cause confusion when mixed with the already-existing "alternate" terminology. In the future, this change allows us to easily introduce for example a `odb_files_source` and other format-specific implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19refs: add function to translate errors to stringsKarthik Nayak
The commit 76e760b999 (refs: introduce enum-based transaction error types, 2025-04-08) introduced enum-based transaction error types. The refs transaction logic was also modified to propagate these errors. For clients of the ref transaction system, it would be beneficial to provide human readable messages for these errors. There is already an existing mapping in 'builtin/update-ref.c', move it to 'refs.c' as `ref_transaction_error_msg()` and use the same within the 'builtin/update-ref.c'. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29treewide: convert users of `repo_has_object_file()` to `has_object()`Patrick Steinhardt
As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanupJunio C Hamano
* ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-17Merge branch 'cj/refname-avail-check-optim-typofix'Junio C Hamano
Comment fix. * cj/refname-avail-check-optim-typofix: refs: fix duplicated word in comment
2025-04-16Merge branch 'kn/non-transactional-batch-updates'Junio C Hamano
Updating multiple references have only been possible in all-or-none fashion with transactions, but it can be more efficient to batch multiple updates even when some of them are allowed to fail in a best-effort manner. A new "best effort batches of updates" mode has been introduced. * kn/non-transactional-batch-updates: update-ref: add --batch-updates flag for stdin mode refs: support rejection in batch updates during F/D checks refs: implement batch reference update support refs: introduce enum-based transaction error types refs/reftable: extract code from the transaction preparation refs/files: remove duplicate duplicates check refs: move duplicate refname update check to generic layer refs/files: remove redundant check in split_symref_update()
2025-04-15Merge branch 'jt/clone-guess-remote-head-fix'Junio C Hamano
"git clone" still gave the message about the default branch name; this message has been turned into an advice message that can be turned off. * jt/clone-guess-remote-head-fix: advice: allow disabling default branch name advice builtin/clone: suppress unexpected default branch advice remote: allow `guess_remote_head()` to suppress advice
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-09refs: fix duplicated word in commentChristian Fredrik Johnsen
Fix a typo in a comment in refs.c: "checking checking" → "checking". Signed-off-by: Christian Fredrik Johnsen <christian@johnsen.no> Acked-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-08refs: support rejection in batch updates during F/D checksKarthik Nayak
The `refs_verify_refnames_available()` is used to batch check refnames for F/D conflicts. While this is the more performant alternative than its individual version, it does not provide rejection capabilities on a single update level. For batched updates, this would mean a rejection of the entire transaction whenever one reference has a F/D conflict. Modify the function to call `ref_transaction_maybe_set_rejected()` to check if a single update can be rejected. Since this function is only internally used within 'refs/' and we want to pass in a `struct ref_transaction *` as a variable. We also move and mark `refs_verify_refnames_available()` to 'refs-internal.h' to be an internal function. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: implement batch reference update supportKarthik Nayak
Git supports making reference updates with or without transactions. Updates with transactions are generally better optimized. But transactions are all or nothing. This means, if a user wants to batch updates to take advantage of the optimizations without the hard requirement that all updates must succeed, there is no way currently to do so. Particularly with the reftable backend where batching multiple reference updates is more efficient than performing them sequentially. Introduce batched update support with a new flag, 'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from transactions, use the transaction infrastructure under the hood. When enabled, this flag allows individual reference updates that would typically cause the entire transaction to fail due to non-system-related errors to be marked as rejected while permitting other updates to proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC' continue to result in the entire transaction failing. This approach enhances flexibility while preserving transactional integrity where necessary. The implementation introduces several key components: - Add 'rejection_err' field to struct `ref_update` to track failed updates with failure reason. - Add a new struct `ref_transaction_rejections` and a field within `ref_transaction` to this struct to allow quick iteration over rejected updates. - Modify reference backends (files, packed, reftable) to handle partial transactions by using `ref_transaction_set_rejected()` instead of failing the entire transaction when `REF_TRANSACTION_ALLOW_FAILURE` is set. - Add `ref_transaction_for_each_rejected_update()` to let callers examine which updates were rejected and why. This foundational change enables batched update support throughout the reference subsystem. A following commit will expose this capability to users by adding a `--batch-updates` flag to 'git-update-ref(1)', providing both a user-facing feature and a testable implementation. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: introduce enum-based transaction error typesKarthik Nayak
Replace preprocessor-defined transaction errors with a strongly-typed enum `ref_transaction_error`. This change: - Improves type safety and function signature clarity. - Makes error handling more explicit and discoverable. - Maintains existing error cases, while adding new error cases for common scenarios. This refactoring paves the way for more comprehensive error handling which we will utilize in the upcoming commits to add batch reference update support. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs/files: remove duplicate duplicates checkKarthik Nayak
Within the files reference backend's transaction's 'finish' phase, a verification step is currently performed wherein the refnames list is sorted and examined for multiple updates targeting the same refname. It has been observed that this verification is redundant, as an identical check is already executed during the transaction's 'prepare' stage. Since the refnames list remains unmodified following the 'prepare' stage, this secondary verification can be safely eliminated. The duplicate check has been removed accordingly, and the `ref_update_reject_duplicates()` function has been marked as static, as its usage is now confined to 'refs.c'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: move duplicate refname update check to generic layerKarthik Nayak
Move the tracking of refnames in `affected_refnames` from individual backends into the generic layer in 'refs.c'. This centralizes the duplicate refname detection that was previously handled separately by each backend. Make some changes to accommodate this move: - Add a `string_list` field `refnames` to `ref_transaction` to contain all the references in a transaction. This field is updated whenever a new update is added via `ref_transaction_add_update`, so manual additions in reference backends are dropped. - Modify the backends to use this field internally as needed. The backends need to check if an update for refname already exists when splitting symrefs or adding an update for 'HEAD'. - In the reftable backend, within `reftable_be_transaction_prepare()`, move the `string_list_has_string()` check above `ref_transaction_add_update()`. Since `ref_transaction_add_update()` automatically adds the refname to `transaction->refnames`, performing the check after will always return true, so we perform the check before adding the update. This helps reduce duplication of functionality between the backends and makes it easier to make changes in a more centralized manner. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29Merge branch 'ps/refname-avail-check-optim'Junio C Hamano
The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`
2025-03-26Merge branch 'tb/refs-exclude-fixes'Junio C Hamano
The refname exclusion logic in the packed-ref backend has been broken for some time, which confused upload-pack to advertise different set of refs. This has been corrected. * tb/refs-exclude-fixes: refs.c: stop matching non-directory prefixes in exclude patterns refs.c: remove empty '--exclude' patterns
2025-03-25advice: allow disabling default branch name adviceJustin Tobler
The default branch name advice message is displayed when `repo_default_branch_name()` is invoked and the `init.defaultBranch` config is not set. In this scenario, the advice message is always shown even if the `--no-advice` option is used. Adapt `repo_default_branch_name()` to allow the default branch name advice message to be disabled with the `--no-advice` option and corresponding configuration. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs: reuse iterators when determining refname availabilityPatrick Steinhardt
When verifying whether refnames are available we have to verify whether any reference exists that is nested under the current reference. E.g. given a reference "refs/heads/foo", we must make sure that there is no other reference "refs/heads/foo/*". This check is performed using a ref iterator with the prefix set to the nested reference namespace. Until now it used to not be possible to reseek iterators, so we always had to reallocate the iterator for every single reference we're about to check. This keeps us from reusing state that the iterator may have and that may make it work more efficiently. Refactor the logic to reseek iterators. This leads to a sizeable speedup with the "reftable" backend: Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 39.8 ms ± 0.9 ms [User: 29.7 ms, System: 9.8 ms] Range (min … max): 38.4 ms … 42.0 ms 62 runs Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 31.9 ms ± 1.1 ms [User: 27.0 ms, System: 4.5 ms] Range (min … max): 29.8 ms … 34.3 ms 74 runs Summary update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran 1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) The "files" backend doesn't really show a huge impact: Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 392.3 ms ± 7.1 ms [User: 59.7 ms, System: 328.8 ms] Range (min … max): 384.6 ms … 404.5 ms 10 runs Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 387.7 ms ± 7.4 ms [User: 54.6 ms, System: 329.6 ms] Range (min … max): 377.0 ms … 397.7 ms 10 runs Summary update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran 1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) This is mostly because it is way slower to begin with because it has to create a separate file for each new reference, so the milliseconds we shave off by reseeking the iterator doesn't really translate into a significant relative improvement. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/iterator: separate lifecycle from iterationPatrick Steinhardt
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs: stop re-verifying common prefixes for availabilityPatrick Steinhardt
One of the checks done by `refs_verify_refnames_available()` is whether any of the prefixes of a reference already exists. For example, given a reference "refs/heads/main", we'd check whether "refs/heads" or "refs" already exist, and if so we'd abort the transaction. When updating multiple references at once, this check is performed for each of the references individually. Consequently, because references tend to have common prefixes like "refs/heads/" or refs/tags/", we evaluate the availability of these prefixes repeatedly. Naturally this is a waste of compute, as the availability of those prefixes should in general not change in the middle of a transaction. And if it would, backends would notice at a later point in time. Optimize this pattern by storing prefixes in a `strset` so that we can trivially track those prefixes that we have already checked. This leads to a significant speedup with the "reftable" backend when creating many references that all share a common prefix: Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms] Range (min … max): 60.6 ms … 69.5 ms 38 runs Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms] Range (min … max): 38.1 ms … 47.3 ms 61 runs Summary update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran 1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) For the "files" backend we see an improvement, but a much smaller one: Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 395.8 ms ± 5.3 ms [User: 63.6 ms, System: 330.5 ms] Range (min … max): 387.0 ms … 404.6 ms 10 runs Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 386.0 ms ± 4.0 ms [User: 51.5 ms, System: 332.8 ms] Range (min … max): 380.8 ms … 392.6 ms 10 runs Summary update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran 1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) This change also leads to a modest improvement when writing references with "initial" semantics, for example when migrating references. The following benchmarks are migrating 1m references from the "reftable" to the "files" backend: Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~) Time (mean ± σ): 836.6 ms ± 5.6 ms [User: 645.2 ms, System: 185.2 ms] Range (min … max): 829.6 ms … 845.9 ms 10 runs Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD) Time (mean ± σ): 759.8 ms ± 5.1 ms [User: 574.9 ms, System: 178.9 ms] Range (min … max): 753.1 ms … 768.8 ms 10 runs Summary migrate reftable:files (refcount = 1000000, revision = HEAD) ran 1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~) And vice versa: Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~) Time (mean ± σ): 870.7 ms ± 5.7 ms [User: 735.2 ms, System: 127.4 ms] Range (min … max): 861.6 ms … 883.2 ms 10 runs Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD) Time (mean ± σ): 799.1 ms ± 8.5 ms [User: 661.1 ms, System: 130.2 ms] Range (min … max): 787.5 ms … 812.6 ms 10 runs Summary migrate files:reftable (refcount = 1000000, revision = HEAD) ran 1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~) The impact here is significantly smaller given that we don't perform any reference reads with "initial" semantics, so the speedup only comes from us doing less string list lookups. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs: introduce function to batch refname availability checksPatrick Steinhardt
The `refs_verify_refname_available()` functions checks whether a reference update can be committed or whether it would conflict with either a prefix or suffix thereof. This function needs to be called once per reference that one wants to check, which requires us to redo a couple of checks every time the function is called. Introduce a new function `refs_verify_refnames_available()` that does the same, but for a list of references. For now, the new function uses the exact same implementation, except that we loop through all refnames provided by the caller. This will be tuned in subsequent commits. The existing `refs_verify_refname_available()` function is reimplemented on top of the new function. As such, the diff is best viewed with the `--ignore-space-change option`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-06refs.c: stop matching non-directory prefixes in exclude patternsTaylor Blau
In the packed-refs backend, our implementation of '--exclude' (dating back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10)) considers, for example: $ git for-each-ref --exclude=refs/heads/ba to exclude "refs/heads/bar", "refs/heads/baz", and so on. The files backend, which does not implement '--exclude' (and relies on the caller to cull out results that don't match) naturally will enumerate "refs/heads/bar" and so on. So in the above example, 'for-each-ref' will try and see if "refs/heads/ba" matches "refs/heads/bar" (since the files backend simply enumerated every loose reference), and, realizing that it does not match, output the reference as expected. (A caller that did want to exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git for-each-ref --exclude='refs/heads/ba*'"). This can lead to strange behavior, like seeing a different set of references advertised via 'upload-pack' depending on what set of references were loose versus packed. So there is a subtle bug with '--exclude' which is that in the packed-refs backend we will consider "refs/heads/bar" to be a pattern match against "refs/heads/ba" when we shouldn't. Likewise, the reftable backend (which in this case is bug-compatible with the packed backend) exhibits the same broken behavior. There are a few ways to fix this. One is to tighten the rules in cmp_record_to_refname(), which is used to determine the start/end-points of the jump list used by the packed backend. In this new "strict" mode, the comparison function would handle the case where we've reached the end of the pattern by introducing a new check like so: while (1) { if (*r1 == '\n') return *r2 ? -1 : 0; if (!*r2) if (strict && *r1 != '/') /* <- here */ return 1; return start ? 1 : -1; if (*r1 != *r2) return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; r1++; r2++; } (eliding out the rest of cmp_record_to_refname()). Equivalently, we could teach refs/packed-backend::populate_excluded_jump_list() to append a trailing '/' if one does not already exist, forcing an exclude pattern like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth. But since the same problem exists in reftable, we can fix both at once by performing this pre-processing step one layer up in refs.c at the common entrypoint for the two, which is 'refs_ref_iterator_begin()'. Since that solution is both the simplest and only requires modification in one spot, let's normalize exclude patterns so that they end with a trailing slash. This causes us to unify the behavior between all three backends. There is some minor test fallout in the "overlapping excluded regions" test, which happens to use 'refs/ba' as an exclude pattern, and expects references under the "refs/heads/bar/*" and "refs/heads/baz/*" hierarchies to be excluded from the results. But that test fallout is expected, because the test was codifying the buggy behavior to begin with, and should have never been written that way. Split that into its own test (since the range is no longer overlapping under the stricter interpretation of --exclude patterns presented here). Create a new test which does have overlapping regions by using a refs/heads/bar/4/... hierarchy and excluding both "refs/heads/bar" and "refs/heads/bar/4". Reported-by: SURA <surak8806@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-06refs.c: remove empty '--exclude' patternsTaylor Blau
In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10), the packed-refs backend learned how to construct "jump lists" to avoid enumerating sections of the packed-refs file that we know the caller is going to throw out anyway. This process works by finding the start- and end-points (that is, where in the packed-refs file corresponds to the range we're going to ignore) for each exclude pattern, then constructing a jump list based on that. At enumeration time we'll consult the jump list to skip past everything in the range(s) found in the previous step, saving time when excluding a large portion of references. But when there is a --exclude pattern which is just the empty string, the behavior is a little funky. When we try and exclude the empty string, the matched range covers the entire packed-refs file, meaning that we won't output any packed references. But the empty pattern doesn't actually match any references to begin with! For example, on my copy of git.git I can do: $ git for-each-ref '' | wc -l 0 So "git for-each-ref --exclude=''" shouldn't actually remove anything from the output, and ought to be equivalent to "git for-each-ref". But it's not, and in fact: $ git for-each-ref | wc -l 2229 $ git for-each-ref --exclude='' | wc -l 480 But why does the '--exclude' version output only some of the references in the repository? Here's a hint: $ find .git/refs -type f | wc -l 480 Indeed, because the files backend doesn't implement[^1] the same jump list concept as the packed backend we get the correct result for the loose references, but none of the packed references. Since the empty string exclude pattern doesn't match anything, we can discard them before the packed-refs backend has a chance to even see it (and likewise for reftable, which also implements a similar concept since 1869525066 (refs/reftable: wire up support for exclude patterns, 2024-09-16)). This approach (copying only some of the patterns into a strvec at the refs.c layer) may seem heavy-handed, but it's setting us up to fix another bug in the following commit where the fix will involve modifying the incoming patterns. [^1]: As noted in 59c35fac54. We technically could avoid opening and enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if we knew that we were excluding anything under the 'refs/heads/foo' hierarchy. But the --exclude stuff is all best-effort anyway, since the caller is expected to cull out any results that they don't want. Noticed-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-05Merge branch 'ps/path-sans-the-repository'Junio C Hamano
The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions
2025-02-27Merge branch 'kn/ref-migrate-skip-reflog'Junio C Hamano
"git refs migrate" can optionally be told not to migrate the reflog. * kn/ref-migrate-skip-reflog: builtin/refs: add '--no-reflog' flag to drop reflogs
2025-02-21builtin/refs: add '--no-reflog' flag to drop reflogsKarthik Nayak
The "git refs migrate" subcommand converts the backend used for ref storage. It always migrates reflog data as well as refs. Introduce an option to exclude reflogs from migration, allowing them to be discarded when they are unnecessary. This is particularly useful in server-side repositories, where reflogs are typically not expected. However, some repositories may still have them due to historical reasons, such as bugs, misconfigurations, or administrative decisions to enable reflogs for debugging. In such repositories, it would be optimal to drop reflogs during the migration. To address this, introduce the '--no-reflog' flag, which prevents reflog migration. When this flag is used, reflogs from the original reference backend are migrated. Since only the new reference backend remains in the repository, all previous reflogs are permanently discarded. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14Merge branch 'kn/reflog-migration-fix-followup'Junio C Hamano
Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static
2025-02-07submodule: refactor `submodule_to_gitdir()` to accept a repoPatrick Steinhardt
The `submodule_to_gitdir()` function implicitly uses `the_repository` to resolve submodule paths. Refactor the function to instead accept a repo as parameter to remove the dependency on global state. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07path: refactor `repo_common_path()` family of functionsPatrick Steinhardt
The functions provided by the "path" subsystem to derive repository paths for the commondir, gitdir, worktrees and submodules are quite inconsistent. Some functions have a `strbuf_` prefix, others have different return values, some don't provide a variant working on top of `strbuf`s. We're thus about to refactor all of these family of functions so that they follow a common pattern: - `repo_*_path()` returns an allocated string. - `repo_*_path_append()` appends the path to the caller-provided buffer while returning a constant pointer to the buffer. This clarifies whether the buffer is being appended to or rewritten, which otherwise wasn't immediately obvious. - `repo_*_path_replace()` replaces contents of the buffer with the computed path, again returning a pointer to the buffer contents. The returned constant pointer isn't being used anywhere yet, but it will be used in subsequent commits. Its intent is to allow calling patterns like the following somewhat contrived example: if (!stat(&st, repo_common_path_replace(repo, &buf, ...)) && !unlink(repo_common_path_replace(repo, &buf, ...))) ... Refactor the commondir family of functions accordingly and adapt all callers. Note that `repo_common_pathv()` is converted into an internal implementation detail. It is only used to implement `the_repository` compatibility shims and will eventually be removed from the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/reflog-migration-fix-fix'Junio C Hamano
Fix bugs in an earlier attempt to fix "git refs migration". * kn/reflog-migration-fix-fix: refs/reftable: fix uninitialized memory access of `max_index` reftable: write correct max_update_index to header
2025-01-29Merge branch 'ps/reflog-migration-with-logall-fix'Junio C Hamano
The "git refs migrate" command did not migrate the reflog for refs/stash, which is the contents of the stashes, which has been corrected. * ps/reflog-migration-with-logall-fix: refs: fix migration of reflogs respecting "core.logAllRefUpdates"
2025-01-22refs: fix migration of reflogs respecting "core.logAllRefUpdates"Patrick Steinhardt
In 246cebe320 (refs: add support for migrating reflogs, 2024-12-16) we have added support to git-refs(1) to migrate reflogs between reference backends. It was reported [1] though that not we don't migrate reflogs for a subset of references, most importantly "refs/stash". This issue is caused by us still honoring "core.logAllRefUpdates" when trying to migrate reflogs: we do queue the updates, but depending on the value of that config we may decide to just skip writing the reflog entry altogether. And given that: - The default for "core.logAllRefUpdates" is to only create reflogs for branches, remotes, note refs and "HEAD" - "refs/stash" is neither of these ref types. We end up skipping the reflog creation for that particular reference. Fix the bug by setting `REF_FORCE_CREATE_REFLOG`, which instructs the ref backends to create the reflog entry regardless of the config or any preexisting state. [1]: <Z5BTQRlsOj1sygun@tapette.crustytoothpaste.net> Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22refs: use 'uint64_t' for 'ref_update.index'Karthik Nayak
The 'ref_update.index' variable is used to store an index for a given reference update. This index is used to order the updates in a predetermined order, while the default ordering is alphabetical as per the refname. For large repositories with millions of references, it should be safer to use 'uint64_t'. Let's do that. This also is applied for all other code sections where we store 'index' and pass it around. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22refs: mark `ref_transaction_update_reflog()` as staticKarthik Nayak
The `ref_transaction_update_reflog()` function is only used within 'refs.c', so mark it as static. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-17Merge branch 'kn/reflog-migration-fix' into kn/reflog-migration-fix-followupJunio C Hamano
* kn/reflog-migration-fix: reftable: write correct max_update_index to header
2025-01-15reftable: write correct max_update_index to headerKarthik Nayak
In 297c09eabb (refs: allow multiple reflog entries for the same refname, 2024-12-16), the reftable backend learned to handle multiple reflog entries within the same transaction. This was done modifying the `update_index` for reflogs with multiple indices. During writing the logs, the `max_update_index` of the writer was modified to ensure the limits were raised to the modified `update_index`s. However, since ref entries are written before the modification to the `max_update_index`, if there are multiple blocks to be written, the reftable backend writes the header with the old `max_update_index`. When all logs are finally written, the footer will be written with the new `min_update_index`. This causes a mismatch between the header and the footer and causes the reftable file to be corrupted. The existing tests only spawn a single block and since headers are lazily written with the first block, the tests didn't capture this bug. To fix the issue, the appropriate `max_update_index` limit must be set even before the first block is written. Add a `max_index` field to the transaction which holds the `max_index` within all its updates, then propagate this value to the reftable backend, wherein this is used to the set the `max_update_index` correctly. Add a test which creates a few thousand reference updates with multiple reflog entries, which should trigger the bug. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>