summaryrefslogtreecommitdiff
path: root/Documentation/technical
AgeCommit message (Collapse)Author
2025-06-25Merge branch 'jc/you-still-use-whatchanged'Junio C Hamano
"git whatchanged" that is longer to type than "git log --raw" which is its modern rough equivalent has outlived its usefulness more than 10 years ago. Plan to deprecate and remove it. * jc/you-still-use-whatchanged: whatschanged: list it in BreakingChanges document whatchanged: remove when built with WITH_BREAKING_CHANGES whatchanged: require --i-still-use-this tests: prepare for a world without whatchanged doc: prepare for a world without whatchanged you-still-use-that??: help deprecating commands for removal
2025-06-18Merge branch 'jw/doc-txt-to-adoc-refs'Junio C Hamano
Some leftover references to documentation source files that no longer exist, due to recent ".txt" -> ".adoc" renaming, have been corrected. * jw/doc-txt-to-adoc-refs: doc: update references to renamed AsciiDoc files
2025-06-17Merge branch 'ds/path-walk-2'Junio C Hamano
"git pack-objects" learns to find delta bases from blobs at the same path, using the --path-walk API. * ds/path-walk-2: pack-objects: allow --shallow and --path-walk path-walk: add new 'edge_aggressive' option pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: add --path-walk option t5538: add tests to confirm deltas in shallow pushes pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas()
2025-06-06doc: update references to renamed AsciiDoc filesJouke Witteveen
The .txt extensions were changed to .adoc in 1f010d6 (doc: use .adoc extension for AsciiDoc files, 2025-01-20). References to the renamed files were not updated yet. Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-02Merge branch 'wk/sparse-checkout-doc-fix'Junio C Hamano
Doc update. * wk/sparse-checkout-doc-fix: doc: sparse-checkout: use consistent inline list style
2025-05-30doc: sparse-checkout: use consistent inline list styleWonuk Kim
Fix this inline list to use a single style, namely numeric, instead of `(1)` followed by `(b)`. Signed-off-by: Wonuk Kim <kimww0306@gmail.com> Acked-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19Merge branch 'sc/bundle-uri-use-all-refs-in-bundle'Junio C Hamano
Bundle-URI feature did not use refs recorded in the bundle other than normal branches as anchoring points to optimize the follow-up fetch during "git clone"; now it is told to utilize all. * sc/bundle-uri-use-all-refs-in-bundle: bundle-uri: add test for bundle-uri clones with tags bundle-uri: copy all bundle references ino the refs/bundle space
2025-05-16path-walk: add new 'edge_aggressive' optionDerrick Stolee
In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16pack-objects: add --path-walk optionDerrick Stolee
In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12whatchanged: remove when built with WITH_BREAKING_CHANGESJunio C Hamano
As we made "git whatchanged" require "--i-still-use-this" and asked the users to report if they still want to use it, the logical next step is to allow us build Git without "whatchanged" to prepare for its eventual removal. If we were to follow the pattern established in 8ccc75c2 (remote: announce removal of "branches/" and "remotes/", 2025-01-22), we can do this together with the documentation update to officially list that the command will be removed in the BreakingChanges document, but let's just keep the changes separate just in case we want to proceed a bit slower. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-25bundle-uri: copy all bundle references ino the refs/bundle spaceScott Chacon
When downloading bundles via the bundle-uri functionality, we only copy the references from refs/heads into the refs/bundle space. I'm not sure why this refspec is hardcoded to be so limited, but it makes the ref negotiation on the subsequent fetch suboptimal, since it won't use objects that are referenced outside of the current heads of the bundled repository. This change to copy everything in refs/ in the bundle to refs/bundles/ significantly helps the subsequent fetch, since nearly all the references are now included in the negotiation. The update to the bundle-uri unbundling refspec puts all the heads from a bundle file into refs/bundle/heads instead of directly into refs/bundle/ so the tests also need to be updated to look in the new heirarchy. Signed-off-by: Scott Chacon <schacon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/parse-options-integers'Junio C Hamano
Update parse-options API to catch mistakes to pass address of an integral variable of a wrong type/size. * ps/parse-options-integers: parse-options: detect mismatches in integer signedness parse-options: introduce precision handling for `OPTION_UNSIGNED` parse-options: introduce precision handling for `OPTION_INTEGER` parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` parse-options: support unit factors in `OPT_INTEGER()` global: use designated initializers for options parse: fix off-by-one for minimum signed values
2025-04-17Merge branch 'en/merge-recursive-debug'Junio C Hamano
Remove remnants of the recursive merge strategy backend, which was superseded by the ort merge strategy. * en/merge-recursive-debug: builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm merge-recursive.[ch]: thoroughly debug these merge, sequencer: switch recursive merges over to ort sequencer: switch non-recursive merges over to ort merge-ort: enable diff-algorithms other than histogram builtin/merge-recursive: switch to using merge_ort_generic() checkout: replace merge_trees() with merge_ort_nonrecursive()
2025-04-17parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`Patrick Steinhardt
With the preceding commit, `OPT_INTEGER()` has learned to support unit factors. Consequently, the major differencen between `OPT_INTEGER()` and `OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of them do support them now. Instead, the difference is that one handles signed and the other handles unsigned integers. Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to `OPT_UNSIGNED()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17parse-options: support unit factors in `OPT_INTEGER()`Patrick Steinhardt
There are two main differences between `OPT_INTEGER()` and `OPT_MAGNITUDE()`: - The former parses signed integers whereas the latter parses unsigned integers. - The latter parses unit factors like 'k', 'm' or 'g'. While the first difference makes obvious sense, there isn't really a good reason why signed integers shouldn't support unit factors, too. This inconsistency will also become a bit of a problem with subsequent commits, where we will fix a couple of callsites that pass an unsigned integer to `OPT_INTEGER()`. There are three options: - We could adapt those users to instead pass a signed integer, but this would needlessly extend the range of accepted integer values. - We could convert them to use `OPT_MAGNITUDE()`, as it only accepts unsigned integers. But now we have the inconsistency that we also start to accept unit factors. - We could introduce `OPT_UNSIGNED()` as equivalent to `OPT_INTEGER()` so that it knows to only accept unsigned integers without unit suffix. Introducing a whole new option type feels a bit excessive. There also isn't really a good reason why `OPT_INTEGER()` cannot be extended to also accept unit factors: all valid values passed to such options cannot have a unit factors right now, so there wouldn't be any ambiguity. Refactor `OPT_INTEGER()` to use `git_parse_int()`, which knows to interpret unit factors. This removes the inconsistency between the signed and unsigned options so that we can easily fix up callsites that pass the wrong integer type right now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08merge-recursive.[ch]: thoroughly debug theseElijah Newren
As a wise man once told me, "Deleted code is debugged code!" So, move the functions that are shared between merge-recursive and merge-ort from the former to the latter, and then debug the remainder of merge-recursive.[ch]. Joking aside, merge-ort was always intended to replace merge-recursive. It has numerous advantages over merge-recursive (operates much faster, can operate without a worktree or index, and fixes a number of known bugs and suboptimal merges). Since we have now replaced all callers of merge-recursive with equivalent functions from merge-ort, move the shared functions from the former to the latter, and delete the remainder of merge-recursive.[ch]. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21Documentation: describe incremental MIDX bitmapsTaylor Blau
Prepare to implement support for reachability bitmaps for the new incremental multi-pack index (MIDX) feature over the following commits. This commit begins by first describing the relevant format and usage details for incremental MIDX bitmaps. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21Documentation: remove a "future work" item from the MIDX docsTaylor Blau
One of the items listed as "future work" in the MIDX's technical documentation is to extend the format to allow MIDXs to be written incrementally across multiple layers. This was suggested all the way back in ceab693d1f (multi-pack-index: add design document, 2018-07-12), and implemented in b9497848df (Merge branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-06Merge branch 'tz/doc-txt-to-adoc-fixes'Junio C Hamano
Fallouts from recent renaming of documentation files from .txt suffix to the new .adoc suffix have been corrected. * tz/doc-txt-to-adoc-fixes: (38 commits) xdiff: *.txt -> *.adoc fixes unpack-trees.c: *.txt -> *.adoc fixes transport.h: *.txt -> *.adoc fixes trace2/tr2_sysenv.c: *.txt -> *.adoc fixes trace2.h: *.txt -> *.adoc fixes t6434: *.txt -> *.adoc fixes t6012: *.txt -> *.adoc fixes t/helper/test-rot13-filter.c: *.txt -> *.adoc fixes simple-ipc.h: *.txt -> *.adoc fixes setup.c: *.txt -> *.adoc fixes refs.h: *.txt -> *.adoc fixes pseudo-merge.h: *.txt -> *.adoc fixes parse-options.h: *.txt -> *.adoc fixes object-name.c: *.txt -> *.adoc fixes list-objects-filter-options.h: *.txt -> *.adoc fixes fsck.h: *.txt -> *.adoc fixes diffcore.h: *.txt -> *.adoc fixes diff.h: *.txt -> *.adoc fixes contrib/long-running-filter: *.txt -> *.adoc fixes config.c: *.txt -> *.adoc fixes ...
2025-03-05Merge branch 'pw/build-meson-technical-and-howto-docs'Junio C Hamano
Meson-based build procedure forgot to build some docs, which has been corrected. * pw/build-meson-technical-and-howto-docs: meson: fix building technical and howto docs
2025-03-05Merge branch 'cc/lop-remote'Junio C Hamano
Large-object promisor protocol extension. * cc/lop-remote: doc: add technical design doc for large object promisors promisor-remote: check advertised name or URL Add 'promisor-remote' capability to protocol v2
2025-03-03doc: *.txt -> *.adoc fixesTodd Zullinger
Update a few more instances of Documentation/*.txt files which have been renamed to *.adoc. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03technical/partial-clone: update reference to rev-list-options.adocTodd Zullinger
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03doc: add technical design doc for large object promisorsChristian Couder
Let's add a design doc about how we could improve handling liarge blobs using "Large Object Promisors" (LOPs). It's a set of features with the goal of using special dedicated promisor remotes to store large blobs, and having them accessed directly by main remotes and clients. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03meson: fix building technical and howto docsPhillip Wood
When our asciidoc files were renamed from "*.txt" to "*.adoc" in 1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20) the "meson.build" file in "Documentation" was updated but the "meson.build" files in the "technical" and "howto" subdirectories were not. This causes the meson build to fail when configured with -Ddocs=html. Fix this by updating the relevant "meson.build" files. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Merge branch 'ds/backfill'Junio C Hamano
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
2025-02-14Merge branch 'bc/doc-adoc-not-txt'Junio C Hamano
All the documentation .txt files have been renamed to .adoc to help content aware editors. * bc/doc-adoc-not-txt: Remove obsolete ".txt" extensions for AsciiDoc files doc: use .adoc extension for AsciiDoc files gitattributes: mark AsciiDoc files as LF-only editorconfig: add .adoc extension doc: update gitignore for .adoc extension
2025-02-03backfill: add --sparse optionDerrick Stolee
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 3 | 110 MB | | | 10K | 12 | 192 MB | 17.2s | | 15K | 9 | 192 MB | 15.5s | | 20K | 8 | 192 MB | 15.5s | | 25K | 7 | 192 MB | 14.7s | This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|------| | (Initial clone) | 2 | 1,876 MB | | | 10K | 11 | 2,187 MB | 46s | | 25K | 7 | 2,188 MB | 43s | | 50K | 5 | 2,194 MB | 44s | | 100K | 4 | 2,194 MB | 48s | This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: basic functionality and testsDerrick Stolee
The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. The path-walk API provides lists of objects in batches according to a common path, but that list could be very small. We want to balance the number of requests to the server with the ability to have the process interrupted with minimal repeated work to catch up in the next run. Based on some experiments (detailed in the next change) a minimum batch size of 50,000 is selected for the default. This batch size is a _minimum_. As the path-walk API emits lists of blob IDs, they are collected into a list of objects for a request to the server. When that list is at least the minimum batch size, then the request is sent to the server for the new objects. However, the list of blob IDs from the path-walk API could be much longer than the batch size. At this moment, it is unclear if there is a benefit to split the list when there are too many objects at the same path. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29Merge branch 'ds/path-walk-1'Junio C Hamano
Introduce a new API to visit objects in batches based on a common path, or by type. * ds/path-walk-1: path-walk: drop redundant parse_tree() call path-walk: reorder object visits path-walk: mark trees and blobs as UNINTERESTING path-walk: visit tags and cached objects path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API test-lib-functions: add test_cmp_sorted path-walk: introduce an object walk by path
2025-01-21doc: use .adoc extension for AsciiDoc filesbrian m. carlson
We presently use the ".txt" extension for our AsciiDoc files. While not wrong, most editors do not associate this extension with AsciiDoc, meaning that contributors don't get automatic editor functionality that could be useful, such as syntax highlighting and prose linting. It is much more common to use the ".adoc" extension for AsciiDoc files, since this helps editors automatically detect files and also allows various forges to provide rich (HTML-like) rendering. Let's do that here, renaming all of the files and updating the includes where relevant. Adjust the various build scripts and makefiles to use the new extension as well. Note that this should not result in any user-visible changes to the documentation. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21doc: update gitignore for .adoc extensionbrian m. carlson
We presently use the ".txt" extension for our AsciiDoc files. While not wrong, most editors do not associate this extension with AsciiDoc, meaning that contributors don't get automatic editor functionality that could be useful, such as syntax highlighting and prose linting. Instead, in a future commit, we're going to move to using the more common ".adoc" extension for these files, which many editors intrinsically recognize as an AsciiDoc file. To avoid contributors accidentally checking in generated files, ignore the new extension for generated files in the documentation .gitignore files. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-14meson: fix missing deps for technical articlesSam James
We need an explicit `depends: documentation_deps` so that all of our Documentation targets know they require asciidoc.conf. This shows up as parallel build failures with it not yet being available. Other targets look OK already. Signed-off-by: Sam James <sam@gentoo.org> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27meson: generate articlesPatrick Steinhardt
While the Meson build system already knows to generate man pages and our user manual, it does not yet generate the random assortment of articles that we have. Plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27Documentation: refactor "api-index.sh" for out-of-tree buildsPatrick Steinhardt
The "api-index.sh" script generates an index of API-related documentation. The script does not handle out-of-tree builds and thus cannot be used easily by Meson. Refactor it to be independent of locations by both accepting a source directory where the API docs live as well as a path to an output file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: mark trees and blobs as UNINTERESTINGDerrick Stolee
When the input rev_info has UNINTERESTING starting points, we want to be sure that the UNINTERESTING flag is passed appropriately through the objects. To match how this is done in places such as 'git pack-objects', we use the mark_edges_uninteresting() method. This method has an option for using the "sparse" walk, which is similar in spirit to the path-walk API's walk. To be sure to keep it independent, add a new 'prune_all_uninteresting' option to the path_walk_info struct. To check how the UNINTERSTING flag is spread through our objects, extend the 'test-tool path-walk' command to output whether or not an object has that flag. This changes our tests significantly, including the removal of some objects that were previously visited due to the incomplete implementation. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: visit tags and cached objectsDerrick Stolee
The rev_info that is specified for a path-walk traversal may specify visiting tag refs (both lightweight and annotated) and also may specify indexed objects (blobs and trees). Update the path-walk API to walk these objects as well. When walking tags, we need to peel the annotated objects until reaching a non-tag object. If we reach a commit, then we can add it to the pending objects to make sure we visit in the commit walk portion. If we reach a tree, then we will assume that it is a root tree. If we reach a blob, then we have no good path name and so add it to a new list of "tagged blobs". When the rev_info includes the "--indexed-objects" flag, then the pending set includes blobs and trees found in the cache entries and cache-tree. The cache entries are usually blobs, though they could be trees in the case of a sparse index. The cache-tree stores previously-hashed tree objects but these are cleared out when staging objects below those paths. We add tests that demonstrate this. The indexed objects come with a non-NULL 'path' value in the pending item. This allows us to prepopulate the 'path_to_lists' strmap with lists for these paths. The tricky thing about this walk is that we will want to combine the indexed objects walk with the commit walk, especially in the future case of walking objects during a command like 'git repack'. Whenever possible, we want the objects from the index to be grouped with similar objects in history. We don't want to miss any paths that appear only in the index and not in the commit history. Thus, we need to be careful to let the path stack be populated initially with only the root tree path (and possibly tags and tagged blobs) and go through the normal depth-first search. Afterwards, if there are other paths that are remaining in the paths_to_lists strmap, we should then iterate through the stack and visit those objects recursively. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: allow consumer to specify object typesDerrick Stolee
We add the ability to filter the object types in the path-walk API so the callback function is called fewer times. This adds the ability to ask for the commits in a list, as well. We re-use the empty string for this set of objects because these are passed directly to the callback function instead of being part of the 'path_stack'. Future changes will add the ability to visit annotated tags. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20t6601: add helper for testing path-walk APIDerrick Stolee
Add some tests based on the current behavior, doing interesting checks for different sets of branches, ranges, and the --boundary option. This sets a baseline for the behavior and we can extend it as new options are introduced. Store and output a 'batch_nr' value so we can demonstrate that the paths are grouped together in a batch and not following some other ordering. This allows us to test the depth-first behavior of the path-walk API. However, we purposefully do not test the order of the objects in the batch, so the output is compared to the expected output through a sort. It is important to mention that the behavior of the API will change soon as we start to handle UNINTERESTING objects differently, but these tests will demonstrate the change in behavior. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: introduce an object walk by pathDerrick Stolee
In anticipation of a few planned applications, introduce the most basic form of a path-walk API. It currently assumes that there are no UNINTERESTING objects, and does not include any complicated filters. It calls a function pointer on groups of tree and blob objects as grouped by path. This only includes objects the first time they are discovered, so an object that appears at multiple paths will not be included in two batches. These batches are collected in 'struct type_and_oid_list' objects, which store an object type and an oid_array of objects. The data structures are documented in 'struct path_walk_context', but in summary the most important are: * 'paths_to_lists' is a strmap that connects a path to a type_and_oid_list for that path. To avoid conflicts in path names, we make sure that tree paths end in "/" (except the root path with is an empty string) and blob paths do not end in "/". * 'path_stack' is a string list that is added to in an append-only way. This stores the stack of our depth-first search on the heap instead of using recursion. * 'path_stack_pushed' is a strmap that stores path names that were already added to 'path_stack', to avoid repeating paths in the stack. Mostly, this saves us from quadratic lookups from doing unsorted checks into the string_list. The coupling of 'path_stack' and 'path_stack_pushed' is protected by the push_to_stack() method. Call this instead of inserting into these structures directly. The walk_objects_by_path() method initializes these structures and starts walking commits from the given rev_info struct. The commits are used to find the list of root trees which populate the start of our depth-first search. The core of our depth-first search is in a while loop that continues while we have not indicated an early exit and our 'path_stack' still has entries in it. The loop body pops a path off of the stack and "visits" the path via the walk_path() method. The walk_path() method gets the list of OIDs from the 'path_to_lists' strmap and executes the callback method on that list with the given path and type. If the OIDs correspond to tree objects, then iterate over all trees in the list and run add_children() to add the child objects to their own lists, adding new entries to the stack if necessary. In testing, this depth-first search approach was the one that used the least memory while iterating over the object lists. There is still a chance that repositories with too-wide path patterns could cause memory pressure issues. Limiting the stack size could be done in the future by limiting how many objects are being considered in-progress, or by visiting blob paths earlier than trees. There are many future adaptations that could be made, but they are left for future updates when consumers are ready to take advantage of those features. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07Documentation: add comparison of build systemsPatrick Steinhardt
We're contemplating whether to eventually replace our build systems with a build system that is easier to use. Add a comparison of build systems to our technical documentation as a baseline for discussion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-22doc: consolidate extensions in git-config documentationCaleb White
The `technical/repository-version.txt` document originally served as the master list for extensions, requiring that any new extensions be defined there. However, the `config/extensions.txt` file was introduced later and has since become the de facto location for describing extensions, with several extensions listed there but missing from `repository-version.txt`. This consolidates all extension definitions into `config/extensions.txt`, making it the authoritative source for extensions. The references in `repository-version.txt` are updated to point to `config/extensions.txt`, and cross-references to related documentation such as `gitrepository-layout[5]` and `git-config[1]` are added. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Caleb White <cdwhite3@pm.me> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-07docs: fix the `maintain-git` links in `technical/platform-support`Johannes Schindelin
These links should point to `.html` files, not to `.txt` ones. Compare also to 4945f046c7f (api docs: link to html version of api-trace2, 2022-09-16). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Documentation/technical: fix a typoAndrew Kreimer
Fix a typo in documentation. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-18Merge branch 'ps/clar-unit-test'Junio C Hamano
Import clar unit tests framework libgit2 folks invented for our use. * ps/clar-unit-test: Makefile: rename clar-related variables to avoid confusion clar: add CMake support t/unit-tests: convert ctype tests to use clar t/unit-tests: convert strvec tests to use clar t/unit-tests: implement test driver Makefile: wire up the clar unit testing framework Makefile: do not use sparse on third-party sources Makefile: make hdr-check depend on generated headers Makefile: fix sparse dependency on GENERATED_H clar: stop including `shellapi.h` unnecessarily clar(win32): avoid compile error due to unused `fs_copy()` clar: avoid compile error with mingw-w64 t/clar: fix compatibility with NonStop t: import the clar unit testing framework t: do not pass GIT_TEST_OPTS to unit tests with prove
2024-09-04t: import the clar unit testing frameworkPatrick Steinhardt
Our unit testing framework is a homegrown solution. While it supports most of our needs, it is likely that the volume of unit tests will grow quite a bit in the future such that we can exercise low-level subsystems directly. This surfaces several shortcomings that the current solution has: - There is no way to run only one specific tests. While some of our unit tests wire this up manually, others don't. In general, it requires quite a bit of boilerplate to get this set up correctly. - Failures do not cause a test to stop execution directly. Instead, the test author needs to return manually whenever an assertion fails. This is rather verbose and is not done correctly in most of our unit tests. - Wiring up a new testcase requires both implementing the test function and calling it in the respective test suite's main function, which is creating code duplication. We can of course fix all of these issues ourselves, but that feels rather pointless when there are already so many unit testing frameworks out there that have those features. We line out some requirements for any unit testing framework in "Documentation/technical/unit-tests.txt". The "clar" unit testing framework, which isn't listed in that table yet, ticks many of the boxes: - It is licensed under ISC, which is compatible. - It is easily vendorable because it is rather tiny at around 1200 lines of code. - It is easily hackable due to the same reason. - It has TAP support. - It has skippable tests. - It preprocesses test files in order to extract test functions, which then get wired up automatically. While it's not perfect, the fact that clar originates from the libgit2 project means that it should be rather easy for us to collaborate with upstream to plug any gaps. Import the clar unit testing framework at commit 1516124 (Merge pull request #97 from pks-t/pks-whitespace-fixes, 2024-08-15). The framework will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22trace2: implement trace2_printf() for event targetJosh Steadmon
The trace2 event target does not have an implementation for trace2_printf(). While the event target is for structured events, and trace2_printf() is for unstructured, human-readable messages, it may still be useful to wrap these unstructured messages in a structured JSON object. Among other things, it may reduce confusion when manually debugging using event trace data. Add a simple implementation for the event target that wraps trace2_printf() messages in a minimal JSON object. Document this in Documentation/technical/api-trace2.txt, and bump the event format version since we're adding a new event type. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-19Merge branch 'tb/incremental-midx-part-1'Junio C Hamano
Incremental updates of multi-pack index files. * tb/incremental-midx-part-1: midx: implement support for writing incremental MIDX chains t/t5313-pack-bounds-checks.sh: prepare for sub-directories t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' midx: implement verification support for incremental MIDXs midx: support reading incremental MIDX chains midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs midx: teach `midx_preferred_pack()` about incremental MIDXs midx: teach `midx_contains_pack()` about incremental MIDXs midx: remove unused `midx_locate_pack()` midx: teach `fill_midx_entry()` about incremental MIDXs midx: teach `nth_midxed_offset()` about incremental MIDXs midx: teach `bsearch_midx()` about incremental MIDXs midx: introduce `bsearch_one_midx()` midx: teach `nth_bitmapped_pack()` about incremental MIDXs midx: teach `nth_midxed_object_oid()` about incremental MIDXs midx: teach `prepare_midx_pack()` about incremental MIDXs midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs midx: add new fields for incremental MIDX chains Documentation: describe incremental MIDX format
2024-08-06Documentation: describe incremental MIDX formatTaylor Blau
Prepare to implement incremental multi-pack indexes (MIDXs) over the next several commits by first describing the relevant prerequisites (like a new chunk in the MIDX format, the directory structure for incremental MIDXs, etc.) The format is described in detail in the patch contents below, but the high-level description is as follows. Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and each `*.midx` within that directory has a single "parent" MIDX, which is the MIDX layer immediately before it in the MIDX chain. The chain order resides in a file 'multi-pack-index-chain' in the same directory. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-02Documentation: add platform support policyEmily Shaffer
Supporting many platforms is only possible when we have the right tools to ensure that support. Teach platform maintainers how they can help us to help them, by explaining what kind of tooling support we would like to have, and what level of support becomes available as a result. Provide examples so that platform maintainers can see what we're asking for in practice. With this policy in place, we can make changes with stronger assurance that we are not breaking anybody we promised not to. Instead, we can feel confident that our existing testing and integration practices protect those who care from breakage. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>