summaryrefslogtreecommitdiff
path: root/setup.c
AgeCommit message (Collapse)Author
2024-06-06refs: convert ref storage format to an enumPatrick Steinhardt
The ref storage format is tracked as a simple unsigned integer, which makes it harder than necessary to discover what that integer actually is or where its values are defined. Convert the ref storage format to instead be an enum. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06setup: unset ref storage when reinitializing repository versionPatrick Steinhardt
When reinitializing a repository's version we may end up unsetting the hash algorithm when it matches the default hash algorithm. If we didn't do that then the previously configured value might remain intact. While the same issue exists for the ref storage extension, we don't do this here. This has been fine for most of the part because it is not supported to re-initialize a repository with a different ref storage format anyway. We're about to introduce a new command to migrate ref storages though, so this is about to become an issue there. Prepare for this and unset the ref storage format when reinitializing a repository with the "files" format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-03Merge branch 'ps/fix-reinit-includeif-onbranch'Junio C Hamano
"git init" in an already created directory, when the user configuration has includeif.onbranch, started to fail recently, which has been corrected. * ps/fix-reinit-includeif-onbranch: setup: fix bug with "includeIf.onbranch" when initializing dir
2024-05-31macOS: ls-files path fails if path of workdir is NFDTorsten Bögershausen
Under macOS, `git ls-files path` does not work (gives an error) if the absolute 'path' contains characters in NFD (decomposed). This happens when core.precomposeunicode is true, which is the most common case. The bug report says: $ cd somewhere # some safe place, /tmp or ~/tmp etc. $ mkdir $'u\xcc\x88' # ü in NFD $ cd ü # or cd $'u\xcc\x88' or cd $'\xc3\xbc' $ git init $ git ls-files $'/somewhere/u\xcc\x88' # NFD fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü' $ git ls-files $'/somewhere/\xc3\xbc' # NFC (the same error as above) In the 'fatal:' error message, there are three ü; the 1st and 2nd are in NFC, the 3rd is in NFD. Add test cases that follows the bug report, with the simplification that the 'ü' is replaced by an 'ä', which is already used as NFD and NFC in t3910. The solution is to add a call to precompose_string_if_needed() to this code in setup.c : `work_tree = precompose_string_if_needed(get_git_work_tree());` There is, however, a limitation with this very usage of Git: The (repo) local .gitconfig file is not used, only the global "core.precomposeunicode" is taken into account, if it is set (or not). To set it to true is a good recommendation anyway, and here is the analyzes from Jun T : The problem is the_repository->config->hash_initialized is set to 1 before the_repository->commondir is set to ".git". Due to this, .git/config is never read, and precomposed_unicode is never set to 1 (remains -1). run_builtin() { setup_git_directory() { strbuf_getcwd() { # setup.c:1542 precompose_{strbuf,string}_if_needed() { # precomposed_unicode is still -1 git_congig_get_bool("core.precomposeunicode") { git_config_check_init() { repo_read_config() { git_config_init() { # !!! the_repository->config->hash_initialized=1 # !!! } # does not read .git/config since # the_repository->commondir is still NULL } } } returns without converting to NFC } returns cwd in NFD } setup_discovered_git_dir() { set_git_work_tree(".") { repo_set_worktree() { # this function indirectly calls strbuf_getcwd() # --> precompose_{strbuf,string}_if_needed() --> # {git,repo}_config_get_bool("core.precomposeunicode"), # but does not try to read .git/config since # the_repository->config->hash_initialized # is already set to 1 above. And it will not read # .git/config even if hash_initialized is 0 # since the_repository->commondir is still NULL. the_repository->worktree = NFD } } } setup_git_env() { repo_setup_gitdir() { repo_set_commondir() { # finally commondir is set here the_repository->commondir = ".git" } } } } // END setup_git_directory Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30Merge branch 'ps/refs-without-the-repository-updates'Junio C Hamano
Further clean-up the refs subsystem to stop relying on the_repository, and instead use the repository associated to the ref_store object. * ps/refs-without-the-repository-updates: refs/packed: remove references to `the_hash_algo` refs/files: remove references to `the_hash_algo` refs/files: use correct repository refs: remove `dwim_log()` refs: drop `git_default_branch_name()` refs: pass repo when peeling objects refs: move object peeling into "object.c" refs: pass ref store when detecting dangling symrefs refs: convert iteration over replace refs to accept ref store refs: retrieve worktree ref stores via associated repository refs: refactor `resolve_gitlink_ref()` to accept a repository refs: pass repo when retrieving submodule ref store refs: track ref stores via strmap refs: implement releasing ref storages refs: rename `init_db` callback to avoid confusion refs: adjust names for `init` and `init_db` callbacks
2024-05-30Merge branch 'ps/undecided-is-not-necessarily-sha1'Junio C Hamano
Before discovering the repository details, We used to assume SHA-1 as the "default" hash function, which has been corrected. Hopefully this will smoke out codepaths that rely on such an unwarranted assumptions. * ps/undecided-is-not-necessarily-sha1: repository: stop setting SHA1 as the default object hash oss-fuzz/commit-graph: set up hash algorithm builtin/shortlog: don't set up revisions without repo builtin/diff: explicitly set hash algo when there is no repo builtin/bundle: abort "verify" early when there is no repository builtin/blame: don't access potentially unitialized `the_hash_algo` builtin/rev-parse: allow shortening to more than 40 hex characters remote-curl: fix parsing of detached SHA256 heads attr: fix BUG() when parsing attrs outside of repo attr: don't recompute default attribute source parse-options-cb: only abbreviate hashes when hash algo is known path: move `validate_headref()` to its only user path: harden validation of HEAD with non-standard hashes
2024-05-29safe.directory: allow "lead/ing/path/*" matchJunio C Hamano
When safe.directory was introduced in v2.30.3 timeframe, 8959555c (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), it only allowed specific opt-out directories. Immediately after an embargoed release that included the change, 0f85c4a3 (setup: opt-out of check with safe.directory=*, 2022-04-13) was done as a response to loosen the check so that a single '*' can be used to say "I trust all repositories" for folks who host too many repositories to list individually. Let's further loosen the check to allow people to say "everything under this hierarchy is deemed safe" by specifying such a leading directory with "/*" appended to it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27config: clarify memory ownership in `git_config_pathname()`Patrick Steinhardt
The out parameter of `git_config_pathname()` is a `const char **` even though we transfer ownership of memory to the caller. This is quite misleading and has led to many memory leaks all over the place. Adapt the parameter to instead be `char **`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-23Merge branch 'ps/refs-without-the-repository-updates' into ↵Junio C Hamano
ps/ref-storage-migration * ps/refs-without-the-repository-updates: refs/packed: remove references to `the_hash_algo` refs/files: remove references to `the_hash_algo` refs/files: use correct repository refs: remove `dwim_log()` refs: drop `git_default_branch_name()` refs: pass repo when peeling objects refs: move object peeling into "object.c" refs: pass ref store when detecting dangling symrefs refs: convert iteration over replace refs to accept ref store refs: retrieve worktree ref stores via associated repository refs: refactor `resolve_gitlink_ref()` to accept a repository refs: pass repo when retrieving submodule ref store refs: track ref stores via strmap refs: implement releasing ref storages refs: rename `init_db` callback to avoid confusion refs: adjust names for `init` and `init_db` callbacks
2024-05-22setup: fix bug with "includeIf.onbranch" when initializing dirPatrick Steinhardt
It was reported that git-init(1) can fail when initializing an existing directory in case the config contains an "includeIf.onbranch:" condition: $ mkdir repo $ git -c includeIf.onbranch:main.path=nonexistent init repo BUG: refs.c:2056: reference backend is unknown The same error can also be triggered when re-initializing an already existing repository. The bug has been introduced in 173761e21b (setup: start tracking ref storage format, 2023-12-29), which wired up the ref storage format. The root cause is in `init_db()`, which tries to read the config before we have initialized `the_repository` and most importantly its ref storage format. We eventually end up calling `include_by_branch()` and execute `refs_resolve_ref_unsafe()`, but because we have not initialized the ref storage format yet this will trigger the above bug. Interestingly, `include_by_branch()` has a mechanism that will only cause us to resolve the ref when `the_repository->gitdir` is set. This is also the reason why this only happens when we initialize an already existing directory or repository: `gitdir` is set in those cases, but not when creating a new directory. Now there are two ways to address the issue: - We can adapt `include_by_branch()` to also make the code conditional on whether `the_repository->ref_storage_format` is set. - We can shift around code such that we initialize the repository format before we read the config. While the first approach would be safe, it may also cause us to paper over issues where a ref store should have been set up. In our case for example, it may be reasonable to expect that re-initializing the repo will cause the "onbranch:" condition to trigger, but we would not do that if the ref storage format was not set up yet. This also used to work before the above commit that introduced this bug. Rearrange the code such that we set up the repository format before reading the config. This fixes the bug and ensures that "onbranch:" conditions can trigger. Reported-by: Heghedus Razvan <heghedus.razvan@protonmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Tested-by: Heghedus Razvan <heghedus.razvan@protonmail.com> [jc: fixed a test and backported to v2.44.0 codebase] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-20Merge branch 'kn/ref-transaction-symref'Junio C Hamano
Updates to symbolic refs can now be made as a part of ref transaction. * kn/ref-transaction-symref: refs: remove `create_symref` and associated dead code refs: rename `refs_create_symref()` to `refs_update_symref()` refs: use transaction in `refs_create_symref()` refs: add support for transactional symref updates refs: move `original_update_refname` to 'refs.c' refs: support symrefs in 'reference-transaction' hook files-backend: extract out `create_symref_lock()` refs: accept symref values in `ref_transaction_update()`
2024-05-17refs: drop `git_default_branch_name()`Patrick Steinhardt
The `git_default_branch_name()` function is a thin wrapper around `repo_default_branch_name()` with two differences: - We implicitly rely on `the_repository`. - We cache the default branch name. None of the callsites of `git_default_branch_name()` are hot code paths though, so the caching of the branch name is not really required. Refactor the callsites to use `repo_default_branch_name()` instead and drop `git_default_branch_name()`, thus getting rid of one more case where we rely on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: rename `init_db` callback to avoid confusionPatrick Steinhardt
Reference backends have two callbacks `init` and `init_db`. The similarity of these two callbacks has repeatedly confused me whenever I was looking at them, where I always had to look up which of them does what. Rename the `init_db` callback to `create_on_disk`, which should hopefully be clearer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-16Merge branch 'ps/refs-without-the-repository'Junio C Hamano
The refs API lost functions that implicitly assumes to work on the primary ref_store by forcing the callers to pass a ref_store as an argument. * ps/refs-without-the-repository: refs: remove functions without ref store cocci: apply rules to rewrite callers of "refs" interfaces cocci: introduce rules to transform "refs" to pass ref store refs: add `exclude_patterns` parameter to `for_each_fullref_in()` refs: introduce missing functions that accept a `struct ref_store`
2024-05-16Merge branch 'ps/refs-without-the-repository' into ↵Junio C Hamano
ps/refs-without-the-repository-updates * ps/refs-without-the-repository: refs: remove functions without ref store cocci: apply rules to rewrite callers of "refs" interfaces cocci: introduce rules to transform "refs" to pass ref store refs: add `exclude_patterns` parameter to `for_each_fullref_in()` refs: introduce missing functions that accept a `struct ref_store`
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-06path: move `validate_headref()` to its only userPatrick Steinhardt
While `validate_headref()` is only called from `is_git_directory()` in "setup.c", it is currently implemented in "path.c". Move it over such that it becomes clear that it is only really used during setup in order to discover repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-29Sync with 2.44.1Johannes Schindelin
* maint-2.44: (41 commits) Git 2.44.1 Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel ...
2024-04-19Sync with 2.43.4Johannes Schindelin
* maint-2.43: (40 commits) Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories ...
2024-04-19Sync with 2.42.2Johannes Schindelin
* maint-2.42: (39 commits) Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' ...
2024-04-19Sync with 2.41.1Johannes Schindelin
* maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
2024-04-19Sync with 2.40.2Johannes Schindelin
* maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
2024-04-19init.templateDir: consider this config setting protectedJohannes Schindelin
The ability to configuring the template directory is a delicate feature: It allows defining hooks that will be run e.g. during a `git clone` operation, such as the `post-checkout` hook. As such, it is of utmost importance that Git would not allow that config setting to be changed during a `git clone` by mistake, allowing an attacker a chance for a Remote Code Execution, allowing attackers to run arbitrary code on unsuspecting users' machines. As a defense-in-depth measure, to prevent minor vulnerabilities in the `git clone` code from ballooning into higher-serverity attack vectors, let's make this a protected setting just like `safe.directory` and friends, i.e. ignore any `init.templateDir` entries from any local config. Note: This does not change the behavior of any recursive clone (modulo bugs), as the local repository config is not even supposed to be written while cloning the superproject, except in one scenario: If a config template is configured that sets the template directory. This might be done because `git clone --recurse-submodules --template=<directory>` does not pass that template directory on to the submodules' initialization. Another scenario where this commit changes behavior is where repositories are _not_ cloned recursively, and then some (intentional, benign) automation configures the template directory to be used before initializing the submodules. So the caveat is that this could theoretically break existing processes. In both scenarios, there is a way out, though: configuring the template directory via the environment variable `GIT_TEMPLATE_DIR`. This change in behavior is a trade-off between security and backwards-compatibility that is struck in favor of security. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-17init: refactor the template directory discovery into its own functionJohannes Schindelin
We will need to call this function from `hook.c` to be able to prevent hooks from running that were written as part of a `clone` but did not originate from the template directory. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-17fetch/clone: detect dubious ownership of local repositoriesJohannes Schindelin
When cloning from somebody else's repositories, it is possible that, say, the `upload-pack` command is overridden in the repository that is about to be cloned, which would then be run in the user's context who started the clone. To remind the user that this is a potentially unsafe operation, let's extend the ownership checks we have already established for regular gitdir discovery to extend also to local repositories that are about to be cloned. This protection extends also to file:// URLs. The fixes in this commit address CVE-2024-32004. Note: This commit does not touch the `fetch`/`clone` code directly, but instead the function used implicitly by both: `enter_repo()`. This function is also used by `git receive-pack` (i.e. pushes), by `git upload-archive`, by `git daemon` and by `git http-backend`. In setups that want to serve repositories owned by different users than the account running the service, this will require `safe.*` settings to be configured accordingly. Also note: there are tiny time windows where a time-of-check-time-of-use ("TOCTOU") race is possible. The real solution to those would be to work with `fstat()` and `openat()`. However, the latter function is not available on Windows (and would have to be emulated with rather expensive low-level `NtCreateFile()` calls), and the changes would be quite extensive, for my taste too extensive for the little gain given that embargoed releases need to pay extra attention to avoid introducing inadvertent bugs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2024-03-21Merge branch 'jc/safe-implicit-bare'Junio C Hamano
Users with safe.bareRepository=explicit can still work from within $GIT_DIR of a seconary worktree (which resides at .git/worktrees/$name/) of the primary worktree without explicitly specifying the $GIT_DIR environment variable or the --git-dir=<path> option. * jc/safe-implicit-bare: setup: notice more types of implicit bare repositories
2024-03-14Merge branch 'gt/core-bare-in-templates'Junio C Hamano
Code simplification. * gt/core-bare-in-templates: setup: remove unnecessary variable
2024-03-11setup: notice more types of implicit bare repositoriesJunio C Hamano
Setting the safe.bareRepository configuration variable to explicit stops git from using a bare repository, unless the repository is explicitly specified, either by the "--git-dir=<path>" command line option, or by exporting $GIT_DIR environment variable. This may be a reasonable measure to safeguard users from accidentally straying into a bare repository in unexpected places, but often gets in the way of users who need valid accesses to the repository. Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit, 2024-01-20) loosened the rule such that being inside the ".git" directory of a non-bare repository does not really count as accessing a "bare" repository. The reason why such a loosening is needed is because often hooks and third-party tools run from within $GIT_DIR while working with a non-bare repository. More importantly, the reason why this is safe is because a directory whose contents look like that of a "bare" repository cannot be a bare repository that came embedded within a checkout of a malicious project, as long as its directory name is ".git", because ".git" is not a name allowed for a directory in payload. There are at least two other cases where tools have to work in a bare-repository looking directory that is not an embedded bare repository, and accesses to them are still not allowed by the recent change. - A secondary worktree (whose name is $name) has its $GIT_DIR inside "worktrees/$name/" subdirectory of the $GIT_DIR of the primary worktree of the same repository. - A submodule worktree (whose name is $name) has its $GIT_DIR inside "modules/$name/" subdirectory of the $GIT_DIR of its superproject. As long as the primary worktree or the superproject in these cases are not bare, the pathname of these "looks like bare but not really" directories will have "/.git/worktrees/" and "/.git/modules/" as a substring in its leading part, and we can take advantage of the same security guarantee allow git to work from these places. Extend the earlier "in a directory called '.git' we are OK" logic used for the primary worktree to also cover the secondary worktree's and non-embedded submodule's $GIT_DIR, by moving the logic to a helper function "is_implicit_bare_repo()". We deliberately exclude secondary worktrees and submodules of a bare repository, as these are exactly what safe.bareRepository=explicit setting is designed to forbid accesses to without an explicit GIT_DIR/--git-dir=<path> Helped-by: Kyle Lippincott <spectral@google.com> Helped-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04setup: remove unnecessary variableGhanshyam Thakkar
The TODO comment suggested to heed core.bare from template config file if no command line override given. And the prev_bare_repository variable seems to have been placed for this sole purpose as it is not used anywhere else. However, it was clarified by Junio [1] that such values (including core.bare) are ignored intentionally and does not make sense to propagate them from template config to repository config. Also, the directories for the worktree and repository are already created, and therefore the bare/non-bare decision has already been made, by the point we reach the codepath where the TODO comment is placed. Therefore, prev_bare_repository does not have a usecase with/without supporting core.bare from template. And the removal of prev_bare_repository is safe as proved by the later part of the comment: "Unfortunately, the line above is equivalent to is_bare_repository_cfg = !work_tree; which ignores the config entirely even if no `--[no-]bare` command line option was present. To see why, note that before this function, there was this call: prev_bare_repository = is_bare_repository() expanding the right hand side: = is_bare_repository_cfg && !get_git_work_tree() = is_bare_repository_cfg && !work_tree note that the last simplification above is valid because nothing calls repo_init() or set_git_work_tree() between any of the relevant calls in the code, and thus the !get_git_work_tree() calls will return the same result each time. So, what we are interested in computing is the right hand side of the line of code just above this comment: prev_bare_repository || !work_tree = is_bare_repository_cfg && !work_tree || !work_tree = !work_tree because "A && !B || !B == !B" for all boolean values of A & B." Therefore, remove the TODO comment and remove prev_bare_repository variable. Also, update relevant testcases and remove one redundant testcase. [1]: https://lore.kernel.org/git/xmqqjzonpy9l.fsf@gitster.g/ Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27builtin/clone: allow remote helpers to detect repoPatrick Steinhardt
In 18c9cb7524 (builtin/clone: create the refdb with the correct object format, 2023-12-12), we have changed git-clone(1) so that it delays creation of the refdb until after it has learned about the remote's object format. This change was required for the reftable backend, which encodes the object format into the tables. So if we pre-initialized the refdb with the default object format, but the remote uses a different object format than that, then the resulting tables would have encoded the wrong object format. This change unfortunately breaks remote helpers which try to access the repository that is about to be created. Because the refdb has not yet been initialized at the point where we spawn the remote helper, we also don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by the remote helper which try to access the repository would fail because it cannot be discovered. This is essentially a chicken-and-egg problem: we cannot initialize the refdb because we don't know about the object format. But we cannot learn about the object format because the remote helper may be unable to access the partially-initialized repository. Ideally, we would address this issue via capabilities. But the remote helper protocol is not structured in a way that guarantees that the capability announcement happens before the remote helper tries to access the repository. Instead, fix this issue by partially initializing the refdb up to the point where it becomes discoverable by Git commands. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-08Merge branch 'en/header-cleanup' into maint-2.43Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-30Merge branch 'kl/allow-working-in-dot-git-in-non-bare-repository'Junio C Hamano
The "disable repository discovery of a bare repository" check, triggered by setting safe.bareRepository configuration variable to 'explicit', has been loosened to exclude the ".git/" directory inside a non-bare repository from the check. So you can do "cd .git && git cmd" to run a Git command that works on a bare repository without explicitly specifying $GIT_DIR now. * kl/allow-working-in-dot-git-in-non-bare-repository: setup: allow cwd=.git w/ bareRepository=explicit
2024-01-26Merge branch 'ps/worktree-refdb-initialization'Junio C Hamano
Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-20setup: allow cwd=.git w/ bareRepository=explicitKyle Lippincott
The safe.bareRepository setting can be set to 'explicit' to disallow implicit uses of bare repositories, preventing an attack [1] where an artificial and malicious bare repository is embedded in another git repository. Unfortunately, some tooling uses myrepo/.git/ as the cwd when executing commands, and this is blocked when safe.bareRepository=explicit. Blocking is unnecessary, as git already prevents nested .git directories. Teach git to not reject uses of git inside of the .git directory: check if cwd is .git (or a subdirectory of it) and allow it even if safe.bareRepository=explicit. [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-16Merge branch 'ps/refstorage-extension'Junio C Hamano
Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-08setup: move creation of "refs/" into the files backendPatrick Steinhardt
When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08refs: prepare `refs_init_db()` for initializing worktree refsPatrick Steinhardt
The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02setup: introduce GIT_DEFAULT_REF_FORMAT envvarPatrick Steinhardt
Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets users control the default ref format used by both git-init(1) and git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which does the same thing for the repository's object format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02setup: introduce "extensions.refStorage" extensionPatrick Steinhardt
Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02setup: set repository's formats on initPatrick Steinhardt
The proper hash algorithm and ref storage format that will be used for a newly initialized repository will be figured out in `init_db()` via `validate_hash_algorithm()` and `validate_ref_storage_format()`. Until now though, we never set up the hash algorithm or ref storage format of `the_repository` accordingly. There are only two callsites of `init_db()`, one in git-init(1) and one in git-clone(1). The former function doesn't care for the formats to be set up properly because it never access the repository after calling the function in the first place. For git-clone(1) it's a different story though, as we call `init_db()` before listing remote refs. While we do indeed have the wrong hash function in `the_repository` when `init_db()` sets up a non-default object format for the repository, it never mattered because we adjust the hash after learning about the remote's hash function via the listed refs. So the current state is correct for the hash algo, but it's not for the ref storage format because git-clone(1) wouldn't know to set it up properly. But instead of adjusting only the `ref_storage_format`, set both the hash algo and the ref storage format so that `the_repository` is in the correct state when `init_db()` exits. This is fine as we will adjust the hash later on anyway and makes it easier to reason about the end state of `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02setup: start tracking ref storage formatPatrick Steinhardt
In order to discern which ref storage format a repository is supposed to use we need to start setting up and/or discovering the format. This needs to happen in two separate code paths. - The first path is when we create a repository via `init_db()`. When we are re-initializing a preexisting repository we need to retain the previously used ref storage format -- if the user asked for a different format then this indicates an error and we error out. Otherwise we either initialize the repository with the format asked for by the user or the default format, which currently is the "files" backend. - The second path is when discovering repositories, where we need to read the config of that repository. There is not yet any way to configure something other than the "files" backend, so we can just blindly set the ref storage format to this backend. Wire up this logic so that we have the ref storage format always readily available when needed. As there is only a single backend and because it is not configurable we cannot yet verify that this tracking works as expected via tests, but tests will be added in subsequent commits. To countermand this ommission now though, raise a BUG() in case the ref storage format is not set up properly in `ref_store_init()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-27Merge branch 'ps/clone-into-reftable-repository'Junio C Hamano
"git clone" has been prepared to allow cloning a repository with non-default hash function into a repository that uses the reftable backend. * ps/clone-into-reftable-repository: builtin/clone: create the refdb with the correct object format builtin/clone: skip reading HEAD when retrieving remote builtin/clone: set up sparse checkout later builtin/clone: fix bundle URIs with mismatching object formats remote-curl: rediscover repository when fetching refs setup: allow skipping creation of the refdb setup: extract function to create the refdb
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-20Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extensionJunio C Hamano
* ps/clone-into-reftable-repository: builtin/clone: create the refdb with the correct object format builtin/clone: skip reading HEAD when retrieving remote builtin/clone: set up sparse checkout later builtin/clone: fix bundle URIs with mismatching object formats remote-curl: rediscover repository when fetching refs setup: allow skipping creation of the refdb setup: extract function to create the refdb
2023-12-12builtin/clone: create the refdb with the correct object formatPatrick Steinhardt
We're currently creating the reference database with a potentially incorrect object format when the remote repository's object format is different from the local default object format. This works just fine for now because the files backend never records the object format anywhere. But this logic will fail with any new reference backend that encodes this information in some form either on-disk or in-memory. The preceding commits have reshuffled code in git-clone(1) so that there is no code path that will access the reference database before we have detected the remote's object format. With these refactorings we can now defer initialization of the reference database until after we have learned the remote's object format and thus initialize it with the correct format from the get-go. These refactorings are required to make git-clone(1) work with the upcoming reftable backend when cloning repositories with the SHA256 object format. This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects the resulting hash format of the empty cloned repository to match the default hash, but now we always end up with a sha1 repository. The problem is that for dumb HTTP fetches, we have no easy way to figure out the remote's hash function except for deriving it based on the hash length of refs in `info/refs`. But as the remote repository is empty we cannot rely on this detection mechanism. Before the change in this commit we already initialized the repository with the default hash function and then left it as-is. With this patch we always use the hash function detected via the remote, where we fall back to "sha1" in case we cannot detect it. Neither the old nor the new behaviour are correct as we second-guess the remote hash function in both cases. But given that this is a rather unlikely edge case (we use the dumb HTTP protocol, the remote repository uses SHA256 and the remote repository is empty), let's simply adapt the test to assert the new behaviour. If we want to properly address this edge case in the future we will have to extend the dumb HTTP protocol so that we can properly detect the hash function for empty repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12setup: allow skipping creation of the refdbPatrick Steinhardt
Allow callers to skip creation of the reference database via a new flag `INIT_DB_SKIP_REFDB`, which is required for git-clone(1) so that we can create it at a later point once the object format has been discovered from the remote repository. Note that we also uplift the call to `create_reference_database()` into `init_db()`, which makes it easier to handle the new flag for us. This changes the order in which we do initialization so that we now set up the Git configuration before we create the reference database. In practice this move should not result in any change in behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12setup: extract function to create the refdbPatrick Steinhardt
We're about to let callers skip creation of the reference database when calling `init_db()`. Extract the logic into a standalone function so that it becomes easier to do this refactoring. While at it, expand the comment that explains why we always create the "refs/" directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09setup: handle NULL value when parsing extensionsJeff King
The "partialclone" extension config records a string, and hence it is an error to have an implicit bool like: [extensions] partialclone in your config. We should recognize and reject this, rather than segfaulting (which is the current behavior). Note that it's OK to use config_error_nonbool() here, even though the return value is an enum. We explicitly document EXTENSION_ERROR as -1 for compatibility with error(), etc. This is the only extension value that has this problem. Most of the others are bools that interpret this value naturally. The exception is extensions.objectformat, which does correctly check for NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>