<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/dir.h, branch v2.26.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2019-12-25T19:21:58Z</updated>
<entry>
<title>Merge branch 'ds/sparse-cone'</title>
<updated>2019-12-25T19:21:58Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-12-25T19:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=bd72a08d6ce451e7d4725d6b3b411d482333e5cb'/>
<id>urn:sha1:bd72a08d6ce451e7d4725d6b3b411d482333e5cb</id>
<content type='text'>
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
</content>
</entry>
<entry>
<title>unpack-trees: hash less in cone mode</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=eb42feca974a333e58c2ca0f3cfa8bf0dd421402'/>
<id>urn:sha1:eb42feca974a333e58c2ca0f3cfa8bf0dd421402</id>
<content type='text'>
The sparse-checkout feature in "cone mode" can use the fact that
the recursive patterns are "connected" to the root via parent
patterns to decide if a directory is entirely contained in the
sparse-checkout or entirely removed.

In these cases, we can skip hashing the paths within those
directories and simply set the skipworktree bit to the correct
value.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-checkout: init and set in cone mode</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=af09ce24a9c79f6efc12d1d8f1052e1d1dbe5016'/>
<id>urn:sha1:af09ce24a9c79f6efc12d1d8f1052e1d1dbe5016</id>
<content type='text'>
To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.

When testing 'git sparse-checkout set' in cone mode, check the
error stream to ensure we do not see any errors. Specifically,
we want to avoid the warning that the patterns do not match
the cone-mode patterns.

Helped-by: Eric Wong &lt;e@80x24.org&gt;
Helped-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sparse-checkout: use hashmaps for cone patterns</title>
<updated>2019-11-22T07:11:44Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-11-21T22:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=96cc8ab5318cd57c8bc203b8f064b35883b2386f'/>
<id>urn:sha1:96cc8ab5318cd57c8bc203b8f064b35883b2386f</id>
<content type='text'>
The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.

As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:

	/$folder/
	!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
     core.sparseCheckout=true: 3.75 s (3.50 s)
 core.sparseCheckoutCone=true: 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.

While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.

Helped-by: Eric Wong &lt;e@80x24.org&gt;
Helped-by: Johannes Schindelin &lt;Johannes.Schindelin@gmx.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: move doc to dir.h</title>
<updated>2019-11-18T06:21:28Z</updated>
<author>
<name>Heba Waly</name>
<email>heba.waly@gmail.com</email>
</author>
<published>2019-11-17T21:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=266f03eccd36b84c88206aa6d18b163223294229'/>
<id>urn:sha1:266f03eccd36b84c88206aa6d18b163223294229</id>
<content type='text'>
Move the documentation from Documentation/technical/api-directory-listing.txt
to dir.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file.

Also documentation/technical/api-directory-listing.txt is removed because
the information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header files.

Signed-off-by: Heba Waly &lt;heba.waly@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/clean-nested-with-ignored'</title>
<updated>2019-10-11T05:24:46Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-11T05:24:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=aafb75452b2e9b3f17db3a07e9ed1cf77fdce693'/>
<id>urn:sha1:aafb75452b2e9b3f17db3a07e9ed1cf77fdce693</id>
<content type='text'>
"git clean" fixes.

* en/clean-nested-with-ignored:
  dir: special case check for the possibility that pathspec is NULL
  clean: fix theoretical path corruption
  clean: rewrap overly long line
  clean: avoid removing untracked files in a nested git repository
  clean: disambiguate the definition of -d
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  dir: add commentary explaining match_pathspec_item's return value
  dir: if our pathspec might match files under a dir, recurse into it
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  dir: also check directories for matching pathspecs
  dir: fix off-by-one error in match_pathspec_item
  dir: fix typo in comment
  t7300: add testcases showing failure to clean specified pathspecs
</content>
</entry>
<entry>
<title>clean: avoid removing untracked files in a nested git repository</title>
<updated>2019-09-17T19:20:35Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-09-17T16:35:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=09487f2cbad36d5da42e204c9e0d201e8607acb8'/>
<id>urn:sha1:09487f2cbad36d5da42e204c9e0d201e8607acb8</id>
<content type='text'>
Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   &gt;tracked
   git add tracked
   git commit -m init
   &gt;untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   &gt;nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>dir: if our pathspec might match files under a dir, recurse into it</title>
<updated>2019-09-17T19:20:35Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-09-17T16:34:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=89a1f4aaf7650288b976c6022b2c5854950d52c6'/>
<id>urn:sha1:89a1f4aaf7650288b976c6022b2c5854950d52c6</id>
<content type='text'>
For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir-&gt;entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two notes of potential interest to future readers:

  * If we wanted to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    and MATCHED_RECURSIVELY are important for such a change.)  I was
    leaving open that possibility while writing an RFC asking for the
    behavior we want, but even though we don't want it, that knowledge
    might help you understand the code flow better.

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is a
    comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to ultimately be moved to treat_path() (or
    another function it calls, such as treat_directory()), but I have
    left that for someone else to tackle in the future.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>unpack-trees: rename 'is_excluded_from_list()'</title>
<updated>2019-09-05T21:05:12Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1'/>
<id>urn:sha1:468ce99b77a0efaf1ace4c31a7b0a7d036fd9ca1</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.

The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.

Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: rename 'exclude' methods to 'pattern'</title>
<updated>2019-09-05T21:05:12Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-09-03T18:04:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=65edd96aecdee2cd4d16a7c17ae9f723c3fe61a4'/>
<id>urn:sha1:65edd96aecdee2cd4d16a7c17ae9f723c3fe61a4</id>
<content type='text'>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':

 * last_exclude_matching() -&gt; last_matching_pattern()
 * parse_exclude() -&gt; parse_path_pattern()

In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:

 * add_exclude_list()
 * add_excludes_from_file_to_list()
 * add_excludes_from_file()
 * add_excludes_from_blob_to_list()
 * add_exclude()
 * clear_exclude_list()

A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
