summaryrefslogtreecommitdiff
path: root/diff-lib.c
AgeCommit message (Collapse)Author
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-09run_diff_files(): de-mystify the size of combine_diff_path structJeff King
We allocate a combine_diff_path struct with space for 5 parents. Why 5? The history is not particularly enlightening. The allocation comes from b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which just switched to xmalloc from a stack struct with 5 elements. That struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode changes as well., 2006-02-06), when we also moved from storing raw sha1 bytes to the combine_diff_parent struct. But no explanation is given. That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and --cc options., 2006-01-28). One might guess it is for the 4 stages we can store in the index. But this code path only ever diffs the current state against stages 2 and 3. So we only need two slots. And it's easy to see this is still the case. We fill the parent slots by subtracting 2 from the ce_stage() values, ignoring values below 2. And since ce_stage() is only 2 bits, there are 4 values, and thus we need 2 slots. Let's use the correct value (saving a tiny bit of memory) and add a comment explaining what's going on (saving a tiny bit of programmer brain power). Arguably we could use: 1 + (STAGEMASK >> STAGESHIFT) - 2 which lets the compiler enforce that we will not go out-of-bounds if we see an unexpected value from ce_stage(). But that is more confusing to explain, and the constant "2" is baked into other parts of the function. It is a fundamental constant, not something where somebody might bump a macro and forget to update this code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-09combine-diff: add combine_diff_path_new()Jeff King
The combine_diff_path struct has variable size, since it embeds both the memory allocation for the path field as well as a variable-sized parent array. This makes allocating one a bit tricky. We have a helper to compute the required size, but it's up to individual sites to actually initialize all of the fields. Let's provide a constructor function to make that a little nicer. Besides being shorter, it also hides away tricky bits like the computation of the "path" pointer (which is right after the "parent" flex array). As a bonus, using the same constructor everywhere means that we'll consistently initialize all parts of the struct. A few code paths left the parent array unitialized. This didn't cause any bugs, but we'll be able to simplify some code in the next few patches knowing that the parent fields have all been zero'd. This also gets rid of some questionable uses of "int" to store buffer lengths. Though we do use them to allocate, I don't think there are any integer overflow vulnerabilities here (the allocation helper promotes them to size_t and checks arithmetic for overflow, and the actual memcpy of the bytes is done using the possibly-truncated "int" value). Sadly we can't use the FLEX_* macros to simplify the allocation here, because there are two variable-sized parts to the struct (and those macros only handle one). Nor can we get stop publicly declaring combine_diff_path_size(). This patch does not touch the code in path_appendnew() at all, which is not ready to be moved to our new constructor for a few reasons: - path_appendnew() has a memory-reuse optimization where it tries to reuse combine_diff_path structs rather than freeing and reallocating. - path_appendnew() does not create the struct from a single path string, but rather allocates and copies into the buffer from multiple sources. These can be addressed by some refactoring, but let's leave it as-is for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-09run_diff_files(): delay allocation of combine_diff_pathJeff King
While looping over the index entries, when we see a higher level stage the first thing we do is allocate a combine_diff_path struct for it. But this can leak; if check_removed() returns an error, we'll continue to the next iteration of the loop without cleaning up. We can fix this by just delaying the allocation by a few lines. I don't think this leak is triggered in the test suite, but it's pretty easy to see by inspection. My ulterior motive here is that the delayed allocation means we have all of the data needed to initialize "dpath" at the time of malloc, making it easier to factor out a constructor function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04diff-lib: fix leaking diffopts in `do_diff_cache()`Patrick Steinhardt
In `do_diff_cache()` we initialize a new `rev_info` and then overwrite its `diffopt` with a user-provided set of options. This can leak memory because `repo_init_revisions()` may end up allocating memory for the `diffopt` itself depending on the configuration. And since that field is overwritten we won't ever free it. Plug the memory leak by releasing the diffopts before we overwrite them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10Merge branch 'jk/output-prefix-cleanup'Junio C Hamano
Code clean-up. * jk/output-prefix-cleanup: diff: store graph prefix buf in git_graph struct diff: return line_prefix directly when possible diff: return const char from output_prefix callback diff: drop line_prefix_length field line-log: use diff_line_prefix() instead of custom helper
2024-10-03diff: return const char from output_prefix callbackJeff King
The diff_options structure has an output_prefix callback for returning a prefix string, but it does so by returning a pointer to a strbuf. This makes the interface awkward. There's no reason the callback should need to use a strbuf, and it creates questions about whether the ownership of the resulting buffer should be transferred to the caller (it should not be, but a recent attempt to clean up this code led to a double-free in some cases). The one advantage we get is that the strbuf contains a ptr/len pair, so we could in theory have a prefix with embedded NULs. But we can observe that none of the existing callbacks would ever produce such a NUL (they are usually just indentation or graph symbols, and even the "--line-prefix" option takes a NUL-terminated string). And anyway, only one caller (the one in log_tree_diff_flush) actually looks at the strbuf length. In every other case we use a helper function which discards the length and just returns the NUL-terminated string. So let's just have the callback return a "const char *" pointer. It's up to the callbacks themselves if they want to use a strbuf under the hood. And now the caller in log_tree_diff_flush() can just use the helper function along with everybody else. That lets us even simplify out the function pointer check, since the helper returns an empty string (technically this does mean we'll sometimes issue an empty fputs() call, but I don't think this code path is hot enough to care about that). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-17diff-lib: drop unused index argument from get_stat_data()Jeff King
The "struct index_state" parameter passed to get_stat_data() has been unused since we stopped passing it to check_removed() in 6a044a2048 (diff-lib: fix check_removed when fsmonitor is on, 2023-09-11). We can just drop it, which in turns lets us simplify our callers a bit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'ps/leakfixes-more'Junio C Hamano
More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: free diff optionsPatrick Steinhardt
There is a todo comment in `release_revisions()` that mentions that we need to free the diff options, which was added via 54c8a7c379 (revisions API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). Releasing the diff options wasn't quite feasible at that time because some call sites rely on its contents to remain even after the revisions have been released. In fact, there really only are a couple of callsites that misbehave here: - `cmd_shortlog()` releases the revisions, but continues to access its file pointer. - `do_diff_cache()` creates a shallow copy of `struct diff_options`, but does not set the `no_free` member. Consequently, we end up releasing resources of the caller-provided diff options. - `diff_free()` and friends do not play nice when being called multiple times as they don't unset data structures that they have just released. Fix all of those cases and enable the call to `diff_free()`, which plugs a bunch of memory leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: refactor `resolve_gitlink_ref()` to accept a repositoryPatrick Steinhardt
In `resolve_gitlink_ref()` we implicitly rely on `the_repository` to look up the submodule ref store. Now that we can look up submodule ref stores for arbitrary repositories we can improve this function to instead accept a repository as parameter for which we want to resolve the gitlink. Do so and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-08Merge branch 'rs/diff-parseopts-cleanup'Junio C Hamano
Code clean-up to remove code that is now a noop. * rs/diff-parseopts-cleanup: diff-lib: stop calling diff_setup_done() in do_diff_cache()
2024-05-01diff-lib: stop calling diff_setup_done() in do_diff_cache()René Scharfe
d44e5267ea (diff-lib: plug minor memory leaks in do_diff_cache(), 2020-11-14) added the call to diff_setup_done() to release the memory of the parseopt member of struct diff_options that repo_init_revisions() had allocated via repo_diff_setup() and prep_parse_options(). 189e97bc4b (diff: remove parseopts member from struct diff_options, 2022-12-01) did away with that allocation; diff_setup_done() doesn't release any memory anymore. So stop calling this function on the blank diffopt member before it is overwritten, as this is no longer necessary. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03revision: optionally record matches with pathspec elementsJunio C Hamano
Unlike "git add" and other end-user facing commands, where it is diagnosed as an error to give a pathspec with an element that does not match any path, the diff machinery does not care if some elements of the pathspec do not match. Given that the diff machinery is heavily used in pathspec-limited "git log" machinery, and it is common for a path to come and go while traversing the project history, this is usually a good thing. However, in some cases we would want to know if all the pathspec elements matched. For example, "git add -u <pathspec>" internally uses the machinery used by "git diff-files" to decide contents from what paths to add to the index, and as an end-user facing command, "git add -u" would want to report an unmatched pathspec element. Add a new .ps_matched member next to the .prune_data member in "struct rev_info" so that we can optionally keep track of the use of .prune_data pathspec elements that can be inspected by the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2024-02-29commit-reach(repo_get_merge_bases): pass on "missing commits" errorsJohannes Schindelin
The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2023-12-27Merge branch 'jc/diff-cached-fsmonitor-fix'Junio C Hamano
The optimization based on fsmonitor in the "diff --cached" codepath is resurrected with the "fake-lstat" introduced earlier. * jc/diff-cached-fsmonitor-fix: diff-lib: fix check_removed() when fsmonitor is active
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13Merge branch 'ar/diff-index-merge-base-fix'Junio C Hamano
"git diff --merge-base X other args..." insisted that X must be a commit and errored out when given an annotated tag that peels to a commit, but we only need it to be a committish. This has been corrected. * ar/diff-index-merge-base-fix: diff: fix --merge-base with annotated tags
2023-10-02tree-walk: init_tree_desc take an oid to get the hash algorithmEric W. Biederman
To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02diff: fix --merge-base with annotated tagsAlyssa Ross
Checking early for OBJ_COMMIT excludes other objects that can be resolved to commits, like annotated tags. If we remove it, annotated tags will be resolved and handled just fine by lookup_commit_reference(), and if we are given something that can't be resolved to a commit, we'll still get a useful error message, e.g.: > error: object 21ab162211ac3ef13c37603ca88b27e9c7e0d40b is a tree, not a commit > fatal: no merge base found Signed-off-by: Alyssa Ross <hi@alyssa.is> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-20Merge branch 'js/diff-cached-fsmonitor-fix'Junio C Hamano
"git diff --cached" codepath did not fill the necessary stat information for a file when fsmonitor knows it is clean and ended up behaving as if it is not clean, which has been corrected. * js/diff-cached-fsmonitor-fix: diff-lib: fix check_removed when fsmonitor is on
2023-09-15diff-lib: fix check_removed() when fsmonitor is activeJunio C Hamano
`git diff-index` may return incorrect deleted entries when fsmonitor is used in a repository with git submodules. This can be observed on Mac machines, but it can affect all other supported platforms too. If fsmonitor is used, `stat *st` is left uninitialied if cache_entry has CE_FSMONITOR_VALID bit set. But, there are three call sites that rely on stat afterwards, which can result in incorrect results. We can fill members of "struct stat" that matters well enough using the information we have in "struct cache_entry" that fsmonitor told us is up-to-date to solve this. Helped-by: Josip Sokcevic <sokcevic@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-15Merge branch 'js/diff-cached-fsmonitor-fix' into jc/diff-cached-fsmonitor-fixJunio C Hamano
* js/diff-cached-fsmonitor-fix: diff-lib: fix check_removed when fsmonitor is on
2023-09-11diff-lib: fix check_removed when fsmonitor is onJosip Sokcevic
`git diff-index` may return incorrect deleted entries when fsmonitor is used in a repository with git submodules. This can be observed on Mac machines, but it can affect all other supported platforms too. If fsmonitor is used, `stat *st` is not initialized if cache_entry has CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat afterwards, which can result in incorrect results. This change partially reverts commit 4f3d6d02 (fsmonitor: skip lstat deletion check during git diff-index, 2021-03-17). Signed-off-by: Josip Sokcevic <sokcevic@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-21diff: drop useless return from run_diff_{files,index} functionsJeff King
Neither of these functions ever returns a value other than zero. Instead, they expect unrecoverable errors to exit immediately, and things like "--exit-code" are stored inside the diff_options struct to be handled later via diff_result_code(). Some callers do check the return values, but many don't bother. Let's drop the useless return values, which are misleading callers about how the functions work. This could be seen as a step in the wrong direction, as we might want to eventually "lib-ify" these to more cleanly return errors up the stack, in which case we'd have to add the return values back in. But there are some benefits to doing this now: 1. In the current code, somebody could accidentally add a "return -1" to one of the functions, which would be erroneously ignored by many callers. By removing the return code, the compiler can notice the mismatch and force the developer to decide what to do. Obviously the other option here is that we could start consistently checking the error code in every caller. But it would be dead code, and we wouldn't get any compile-time help in catching new cases. 2. It communicates the situation to callers, who may want to choose a different function. These functions are really thin wrappers for doing git-diff-files and git-diff-index within the process. But callers who care about recovering from an error here are probably better off using the underlying library functions, many of which do return errors. If somebody eventually wants to teach these functions to propagate errors, they'll have to switch back to returning a value, effectively reverting this patch. But at least then they will be starting with a level playing field: they know that they will need to inspect each caller to see how it should handle the error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-21diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()Junio C Hamano
Many callers of run_diff_index() passed literal "1" for the option flag word, which should better be spelled out as DIFF_INDEX_CACHED for readablity. Everybody else passes "0" that can stay as-is. The other bit in the option flag word is DIFF_INDEX_MERGE_BASE, but curiously there is only one caller that can pass it, which is "git diff-index --merge-base" itself---no internal callers uses the feature. A bit tricky call to the function is in builtin/submodule--helper.c where the .cached member in a private struct is set/reset as a plain Boolean flag, which happens to be "1" and happens to match the value of DIFF_INDEX_CACHED. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-21diff.h: remove unnecessary include of oidset.hElijah Newren
This also made it clear that several .c files depended upon various things that oidset included, but had omitted the direct #include for those headers. Add those now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-14diff-lib: honor override_submodule_config flag bitJosip Sokcevic
When `diff.ignoreSubmodules = all` is set and submodule commits are manually staged (e.g. via `git-update-index`), `git-commit` should record the commit with updated submodules. `index_differs_from` is called from `prepare_to_commit` with flags set to `override_submodule_config = 1`. `index_differs_from` then merges the default diff flags and passed flags. When `diff.ignoreSubmodules` is set to "all", `flags` ends up having both `override_submodule_config` and `ignore_submodules` set to 1. This results in `git-commit` ignoring staged commits. This patch restores original `flags.ignore_submodule` if `flags.override_submodule_config` is set. Signed-off-by: Josip Sokcevic <sokcevic@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24symlinks.h: move declarations for symlinks.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13diff: mark unused parameters in callbacksJeff King
The diff code provides a format_callback interface, but not every callback needs each parameter (e.g., the "opt" and "data" parameters are frequently left unused). Likewise for the output_prefix callback, the low-level change/add_remove interfaces, the callbacks used by xdi_diff(), etc. Mark unused arguments in the callback implementations to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08oneway_diff: handle removed sparse directoriesVictoria Dye
Update 'do_oneway_diff()' to perform a 'diff_tree_oid()' on removed sparse directories, as it does for added or modified sparse directories (see 9eb00af562 (diff-lib: handle index diffs with sparse dirs, 2021-07-14)). At the moment, this update is unreachable code because 'unpack_trees()' (currently the only way 'oneway_diff()' can be called, via 'diff_cache()') will always traverse trees down to the individual removed files of a deleted sparse directory. A subsequent patch will change this to better preserve a sparse index in other uses of 'unpack_tree()', e.g. 'git reset --hard'. However, making that change without this patch would result in (among other issues) 'git status' printing only the name of a deleted sparse directory, not its contents. To avoid introducing that bug, 'do_oneway_diff()' is updated before modifying 'unpack_trees()'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: add a TODO for diff_free(&revs->diffopt)Ævar Arnfjörð Bjarmason
Add a TODO comment indicating that we should release "diffopt" in release_revisions(). In a preceding commit we started releasing the "pruning" member of the same type, but handling "diffopt" will require us to untangle the "no_free" conditions I added in e900d494dcf (diff: add an API for deferred freeing, 2021-02-11). Let's leave a TODO comment to that effect, and so that we don't forget refactor code that was changed to use release_revisions() in earlier commits to stop using the "diffopt" member after a call to release_revisions(). This works currently, but would become a logic error as soon as we started freeing "diffopt". Doing that change now doesn't harm anything, and future-proofs us against a later change to release_revisions(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "prune_data"Ævar Arnfjörð Bjarmason
Extend the the release_revisions() function so that it frees the "prune_data" in the "struct rev_info". This means that any code that calls "release_revisions()" already can get rid of adjacent calls to clear_pathspec(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API users: use release_revisions() for "prune_data" usersÆvar Arnfjörð Bjarmason
Use release_revisions() for users of "struct rev_list" that reach into the "struct rev_info" and clear the "prune_data" already. In a subsequent commit we'll teach release_revisions() to clear this itself, but in the meantime let's invoke release_revisions() here to clear anything else we may have missed, and for reasons of having consistent boilerplate. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: provide and start using a release_revisions()Ævar Arnfjörð Bjarmason
The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08Merge branch 'dd/diff-files-unmerged-fix'Junio C Hamano
"git diff --relative" segfaulted and/or produced incorrect result when there are unmerged paths. * dd/diff-files-unmerged-fix: diff-lib: ignore paths that are outside $cwd if --relative asked