Age | Commit message (Collapse) | Author |
|
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository. A
midx does span mulitple "object sources".
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
|
|
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
|
|
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.
Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An interchange format for stash entries is defined, and subcommand
of "git stash" to import/export has been added.
* bc/stash-export-import:
builtin/stash: provide a way to import stashes from a ref
builtin/stash: provide a way to export stashes to a ref
builtin/stash: factor out revision parsing into a function
object-name: make get_oid quietly return an error
|
|
A reasonable person looking at the signature and usage of get_oid and
friends might conclude that in the event of an error, it always returns
-1. However, this is not the case. Instead, get_oid_basic dies if we
go too far back into the history of a reflog (or, when quiet, simply
exits).
This is not especially useful, since in many cases, we might want to
handle this error differently. Let's add a flag here to make it just
return -1 like elsewhere in these code paths.
Note that we cannot make this behavior the default, since we have many
other codepaths that rely on the existing behavior, including in tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
|
|
When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.
Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.
This function will be used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.
A few additional notes:
- This is a slight backward incompatibility break, because we used
to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
a backward incompatible break is necessary, because there is no
other way for someone to be more specific and disambiguate that they
want the blob master:path/to/who-gabbed instead of the object abbed.
- There is a possibility that check_refname_format() rules change in
the future. However, we can only realistically loosen the rules
for what that function accepts rather than tighten. If we were to
tighten the rules, some real world repositories may already have
refnames that suddenly become unacceptable and we break those
repositories. As such, any describe-like syntax of the form
${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
changes in this commit will remain valid in the future.
- The fact that check_refname_format() rules could loosen in the
future is probably also an important reason to make this change. If
the rules loosen, there might be additional cases within
${GARBAGE}-g${HASH} that become ambiguous in the future. While
abbreviated hashes can be disambiguated by abbreviating less, it may
well be that these alternative object names have no way of being
disambiguated (much like pathnames cannot be). Accepting all random
${GARBAGE} thus makes it difficult for us to allow future
extensions to object naming.
So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.
Also, add a few tests for git describe using object names of the form
${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Note that we'd prefer not duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well. But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
The syntax ":/<text>" to name the latest commit with the matching
text was broken with a recent change, which has been corrected.
* ps/commit-with-message-syntax-fix:
object-name: fix reversed ordering with ":/<text>" revisions
|
|
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"
|
|
Recently it was reported [1] that "look for the youngest commit
reachable from any ref with log message that match the given
pattern" syntax (i.e. ':/<text>') started to return results in
reverse recency order. This regression was introduced in Git v2.47.0
and is caused by a memory leak fix done in 57fb139b5e (object-name:
fix leaking commit list items, 2024-08-01).
The intent of the identified commit is to stop modifying the commit list
provided by the caller such that the caller can properly free all commit
list items, including those that the called function might potentially
remove from the list. This was done by creating a copy of the passed-in
commit list and modifying this copy instead of the caller-provided list.
We already knew to create such a copy beforehand with the `backup` list,
which was used to clear the `ONELINE_SEEN` commit mark after we were
done. So the refactoring simply renamed that list to `copy` and started
to operate on that list instead. There is a gotcha though: the backup
list, and thus now also the copied list, is always being prepended to,
so the resulting list is in reverse order! The end result is that we
pop commits from the wrong end of the commit list, returning commits in
reverse recency order.
Fix the bug by appending to the list instead.
[1]: <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>
Reported-by: Aarni Koskela <aarni@valohai.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The advice message currently suggests using "git config advice..." to
disable advice messages, but since
00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06)
we have the "set" subcommand for config. Since using the subcommand is
more in-line with the modern interface, any advice should be promoting
its usage. Change the disable advice message to use the subcommand
instead. Change all uses of "git config advice" in the tests to use the
subcommand.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper
functions to deal with branch and tag names, and the fact that they
happen to use strbuf to hold the name of a branch or a tag is not
essential. These functions fit better in the refs API than strbuf
API, the latter of which is about string manipulations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Same as the preceding commits, storing the "core.warnAmbiguousRefs"
value globally is misdesigned as this setting may be set per repository.
Move the logic into the repo-settings subsystem. The usual pattern here
is that users are expected to call `prepare_repo_settings()` before they
access the settings themselves. This seems somewhat fragile though, as
it is easy to miss and leads to somewhat ugly code patterns at the call
sites.
Instead, introduce a new function that encapsulates this logic for us.
This also allows us to change how exactly the lazy initialization works
in the future, e.g. by only partially initializing values as requested
by the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More leak fixes.
* ps/leakfixes-part-4: (22 commits)
builtin/diff: free symmetric diff members
diff: free state populated via options
builtin/log: fix leak when showing converted blob contents
userdiff: fix leaking memory for configured diff drivers
builtin/format-patch: fix various trivial memory leaks
diff: fix leak when parsing invalid ignore regex option
unpack-trees: clear index when not propagating it
sequencer: release todo list on error paths
merge-ort: unconditionally release attributes index
builtin/fast-export: plug leaking tag names
builtin/fast-export: fix leaking diff options
builtin/fast-import: plug trivial memory leaks
builtin/notes: fix leaking `struct notes_tree` when merging notes
builtin/rebase: fix leaking `commit.gpgsign` value
config: fix leaking comment character config
submodule-config: fix leaking name entry when traversing submodules
read-cache: fix leaking hashfile when writing index fails
bulk-checkin: fix leaking state TODO
object-name: fix leaking symlink paths in object context
object-file: fix memory leak when reading corrupted headers
...
|
|
Incremental updates of multi-pack index files.
* tb/incremental-midx-part-1:
midx: implement support for writing incremental MIDX chains
t/t5313-pack-bounds-checks.sh: prepare for sub-directories
t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
midx: implement verification support for incremental MIDXs
midx: support reading incremental MIDX chains
midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
midx: teach `midx_preferred_pack()` about incremental MIDXs
midx: teach `midx_contains_pack()` about incremental MIDXs
midx: remove unused `midx_locate_pack()`
midx: teach `fill_midx_entry()` about incremental MIDXs
midx: teach `nth_midxed_offset()` about incremental MIDXs
midx: teach `bsearch_midx()` about incremental MIDXs
midx: introduce `bsearch_one_midx()`
midx: teach `nth_bitmapped_pack()` about incremental MIDXs
midx: teach `nth_midxed_object_oid()` about incremental MIDXs
midx: teach `prepare_midx_pack()` about incremental MIDXs
midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
midx: add new fields for incremental MIDX chains
Documentation: describe incremental MIDX format
|
|
The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.
* jc/refs-symref-referent:
ref-filter: populate symref from iterator
refs: add referent to each_ref_fn
refs: keep track of unresolved reference value in iterators
|
|
The object context may be populated with symlink contents when reading a
symlink, but the associated strbuf doesn't ever get released when
releasing the object context, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).
While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.
For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).
Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.
At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling `get_oid_oneline()`, we pass in a `struct commit_list` that
gets modified by the function. This creates a weird situation where the
commit list may sometimes be empty after returning, but sometimes it
will continue to carry additional commits. In those cases the remainder
of the list leaks.
Ultimately, the design where we only pass partial ownership to
`get_oid_oneline()` feels shoddy. Refactor the code such that we only
pass a constant pointer to the list, creating a local copy as needed.
Callers are thus always responsible for freeing the commit list, which
then allows us to plug a bunch of memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More memory leaks have been plugged.
* ps/leakfixes-more: (29 commits)
builtin/blame: fix leaking ignore revs files
builtin/blame: fix leaking prefixed paths
blame: fix leaking data for blame scoreboards
line-range: plug leaking find functions
merge: fix leaking merge bases
builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`
sequencer: fix memory leaks in `make_script_with_merges()`
builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`
apply: fix leaking string in `match_fragment()`
sequencer: fix leaking string buffer in `commit_staged_changes()`
commit: fix leaking parents when calling `commit_tree_extended()`
config: fix leaking "core.notesref" variable
rerere: fix various trivial leaks
builtin/stash: fix leak in `show_stash()`
revision: free diff options
builtin/log: fix leaking commit list in git-cherry(1)
merge-recursive: fix memory leak when finalizing merge
builtin/merge-recursive: fix leaking object ID bases
builtin/difftool: plug memory leaks in `run_dir_diff()`
object-name: free leaking object contexts
...
|
|
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
|
|
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When given a length that equals the current hash algorithm's hex size,
then `repo_find_unique_abbrev_r()` exits early without trying to find an
abbreviation. This is only sensible because there is nothing to
abbreviate in the first place, so searching through objects to find a
unique prefix would be a waste of compute.
What we don't handle though is the case where the user passes a length
greater than the hash length. This is fine in practice as we still
compute the correct result. But at the very least, this is a waste of
resources as we try to abbreviate a value that cannot be abbreviated,
which causes us to hit the object database.
Start to explicitly handle values larger than hexsz to avoid this
performance penalty, which leads to a measureable speedup. The following
benchmark has been executed in linux.git:
Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~)
Time (mean ± σ): 12.812 s ± 0.040 s [User: 12.225 s, System: 0.554 s]
Range (min … max): 12.723 s … 12.857 s 10 runs
Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD)
Time (mean ± σ): 11.095 s ± 0.029 s [User: 10.546 s, System: 0.521 s]
Range (min … max): 11.037 s … 11.122 s 10 runs
Summary
git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran
1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While it is documented in `struct object_context::path` that this
variable needs to be released by the caller, this fact is rather easy to
miss given that we do not ever provide a function to release the object
context. And of course, while some callers dutifully release the path,
many others don't.
Introduce a new `object_context_release()` function that releases the
path. Convert callsites that used to free the path to use that new
function and add missing calls to callsites that were leaking memory.
Refactor those callsites as required to have a single return path, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
|
|
Make sure failure return from merge_bases_many() is properly caught.
* js/merge-base-with-missing-commit:
merge-ort/merge-recursive: do report errors in `merge_submodule()`
merge-recursive: prepare for `merge_submodule()` to report errors
commit-reach(repo_get_merge_bases_many_dirty): pass on errors
commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors
commit-reach(get_octopus_merge_bases): pass on "missing commits" errors
commit-reach(repo_get_merge_bases): pass on "missing commits" errors
commit-reach(get_merge_bases_many_0): pass on "missing commits" errors
commit-reach(merge_bases_many): pass on "missing commits" errors
commit-reach(paint_down_to_common): start reporting errors
commit-reach(paint_down_to_common): prepare for handling shallow commits
commit-reach(repo_in_merge_bases_many): report missing commits
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
commit-reach(paint_down_to_common): plug two memory leaks
|
|
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `repo_get_merge_bases()` macro) is aware of that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next step: adjust the callers of `get_octopus_merge_bases()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) was that if we have "n" entries in a reflog, we should still
be able to resolve ref@{n} by looking at the "old" value of the oldest
entry.
Commit 6436a20284 tried to put the logic into read_ref_at() by shifting
its idea of "n" by one. But we reverted that in the previous commit,
since it led to bugs in other callers which cared about the details of
the reflog entry we found. Instead, let's put the special case into the
caller that resolves @{n}, as it cares only about the oid.
read_ref_at() is even kind enough to return the "old" value from the
final reflog; it just returns "1" to signal to us that we ran off the
end of the reflog. But we can notice in the caller that we read just
enough records for that "old" value to be the one we're looking for, and
use it.
Note that read_ref_at() could notice this case, too, and just return 0.
But we don't want to do that, because the caller must be made aware that
we only found the oid, not an actual reflog entry (and the call sites in
show-branch do care about this).
There is one complication, though. When read_ref_at() hits a truncated
reflog, it will return the "old" value of the oldest entry only if it is
not the null oid. Otherwise, it actually returns the "new" value from
that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in
oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a
ref created recently will produce the initial value as a convenience
(even though technically it did not exist 20 years ago).
But this convenience is only useful for time-based cutoffs. For
count-based cutoffs, get_oid_basic() has always simply complained about
going too far back:
$ git rev-parse HEAD@{20}
fatal: log for 'HEAD' only has 16 entries
and we should continue to do so, rather than returning a nonsense value
(there's even a test in t1508 already which covers this). So let's have
the d1a4489a56 code kick in only when doing timestamp-based cutoffs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Support short oids encoded in any algorithm, while ensuring enough of
the oid is specified to disambiguate between all of the oids in the
repository encoded in any algorithm.
By default have the code continue to only accept oids specified in the
storage hash algorithm of the repository, but when something is
ambiguous display all of the possible oids from any accepted oid
encoding.
A new flag is added GET_OID_HASH_ANY that when supplied causes the
code to accept oids specified in any hash algorithm, and to return the
oids that were resolved.
This implements the functionality that allows both SHA-1 and SHA-256
object names, from the "Object names on the command line" section of
the hash function transition document.
Care is taken in get_short_oid so that when the result is ambiguous
the output remains the same if GIT_OID_HASH_ANY was not supplied. If
GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
the prefix are displayed.
This required updating repo_for_each_abbrev to give it a parameter so
that it knows to look at all hash algorithms.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move functions that are not about pure string manipulation out of
strbuf.[ch]
* cw/strbuf-cleanup:
strbuf: remove global variable
path: move related function to path
object-name: move related functions to object-name
credential-store: move related functions to credential-store file
abspath: move related functions to abspath
strbuf: clarify dependency
strbuf: clarify API boundary
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move object-name-related functions from strbuf.[ch] to object-name.[ch]
so that strbuf is focused on string manipulation routines with minimal
dependencies.
dir.h relied on the forward declration of the repository struct in
strbuf.h. Since that is removed in this patch, add the forward
declaration to dir.h.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|