<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/dir.c, branch v1.5.6.4</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.5.6.4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.5.6.4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2008-05-11T01:14:28Z</updated>
<entry>
<title>Merge branch 'lt/case-insensitive'</title>
<updated>2008-05-11T01:14:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-05-11T01:14:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=380a7426794dcad369dd48519cd01d6e0246cde5'/>
<id>urn:sha1:380a7426794dcad369dd48519cd01d6e0246cde5</id>
<content type='text'>
* lt/case-insensitive:
  Make git-add behave more sensibly in a case-insensitive environment
  When adding files to the index, add support for case-independent matches
  Make unpack-tree update removed files before any updated files
  Make branch merging aware of underlying case-insensitive filsystems
  Add 'core.ignorecase' option
  Make hash_name_lookup able to do case-independent lookups
  Make "index_name_exists()" return the cache_entry it found
  Move name hashing functions into a file of its own
  Make unpack_trees_options bit flags actual bitfields
</content>
</entry>
<entry>
<title>Optimize match_pathspec() to avoid fnmatch()</title>
<updated>2008-04-27T00:48:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-04-19T21:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=88ea8112b4abca416993e2a3145616cd732771c8'/>
<id>urn:sha1:88ea8112b4abca416993e2a3145616cd732771c8</id>
<content type='text'>
"git add *" is actually fundamentally different from "git add .", and
yeah, you should generally use the latter.

The reason? The argument list is actually something different from what
you think it is. For git, it's a "pathspec", so what actualy happens is
that in *both* cases, it will really traverse the whole tree, and then
match every file it finds against the pathspec.

So think of the arguments not as a file list, but as a random bunch of
patterns to match against the files you have!

Which is why the cost is actually approximately O(n*m), where "n" is the
size of the working tree, and "m" is the number of pathspecs.

So the reason "git add ." is fast is actually that "m" in that case is
just 1 (just one trivial pattern), and then "git add *" is slow because
"m" is large (lots of complicated patterns). In both cases, 'n' is the
same (== the whole set of files in your working tree).

Anyway, here's a trivial patch that doesn't change this fundamental fact,
but that avoids doing anything *expensive* until we've done some cheap
initial tests. It may or may not help your test-case, but it's pretty
simple and it matches the other git optimizations in this area (ie
"conceptually handle the general case, but optimize the simple cases where
we can exit early")

Notice how this patch doesn' actually change the fundamental O(n^2)
behaviour, but it makes it much cheaper by generally avoiding the
expensive 'fnmatch' and 'strlen/strncmp' when they are obviously not
needed.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>git clean: Don't automatically remove directories when run within subdirectory</title>
<updated>2008-04-15T06:14:58Z</updated>
<author>
<name>Shawn Bohrer</name>
<email>shawn.bohrer@gmail.com</email>
</author>
<published>2008-04-15T03:14:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f2d0df7148a1b9ee69bd69a365ffcd1965872451'/>
<id>urn:sha1:f2d0df7148a1b9ee69bd69a365ffcd1965872451</id>
<content type='text'>
When git clean is run from a subdirectory it should follow the normal
policy and only remove directories if they are passed in as a pathspec,
or -d is specified.

The fix is to send len which could be shorter than ent-&gt;len because we
have stripped the trailing '/' that read_directory adds. Additionaly
match_one() was modified to allow a name[] that is not NUL terminated.
This allows us to check if the name matched the pathspec exactly
instead of recursively.

Signed-off-by: Shawn Bohrer &lt;shawn.bohrer@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Add 'core.ignorecase' option</title>
<updated>2008-04-09T08:22:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2008-03-21T23:52:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=0a9b88b7dee70bd36d35b7857640a18ee3adeef1'/>
<id>urn:sha1:0a9b88b7dee70bd36d35b7857640a18ee3adeef1</id>
<content type='text'>
..and start using it for directory entry traversal (ie "git status" will
not consider entries that match an existing entry case-insensitively to
be a new file)

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Make hash_name_lookup able to do case-independent lookups</title>
<updated>2008-04-09T08:22:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2008-03-21T22:55:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=cd2fef59edf72a1d9792d9cb72aae1e6f6c7b1d4'/>
<id>urn:sha1:cd2fef59edf72a1d9792d9cb72aae1e6f6c7b1d4</id>
<content type='text'>
Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Avoid unnecessary "if-before-free" tests.</title>
<updated>2008-02-22T22:14:40Z</updated>
<author>
<name>Jim Meyering</name>
<email>meyering@redhat.com</email>
</author>
<published>2008-01-31T17:26:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=8e0f70033b2bd1679a6e5971978fdc3ee09bdb72'/>
<id>urn:sha1:8e0f70033b2bd1679a6e5971978fdc3ee09bdb72</id>
<content type='text'>
This change removes all obvious useless if-before-free tests.
E.g., it replaces code like this:

        if (some_expression)
                free (some_expression);

with the now-equivalent:

        free (some_expression);

It is equivalent not just because POSIX has required free(NULL)
to work for a long time, but simply because it has worked for
so long that no reasonable porting target fails the test.
Here's some evidence from nearly 1.5 years ago:

    http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html

FYI, the change below was prepared by running the following:

  git ls-files -z | xargs -0 \
  perl -0x3b -pi -e \
    's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s'

Note however, that it doesn't handle brace-enclosed blocks like
"if (x) { free (x); }".  But that's ok, since there were none like
that in git sources.

Beware: if you do use the above snippet, note that it can
produce syntactically invalid C code.  That happens when the
affected "if"-statement has a matching "else".
E.g., it would transform this

  if (x)
    free (x);
  else
    foo ();

into this:

  free (x);
  else
    foo ();

There were none of those here, either.

If you're interested in automating detection of the useless
tests, you might like the useless-if-before-free script in gnulib:
[it *does* detect brace-enclosed free statements, and has a --name=S
 option to make it detect free-like functions with different names]

  http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free

Addendum:
  Remove one more (in imap-send.c), spotted by Jean-Luc Herren &lt;jlh@gmx.ch&gt;.

Signed-off-by: Jim Meyering &lt;meyering@redhat.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/gitignore-ends-with-slash'</title>
<updated>2008-02-17T01:57:06Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-02-17T01:57:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=987e315a6b5a5dd224602f09b9dc7c0fe9c7d024'/>
<id>urn:sha1:987e315a6b5a5dd224602f09b9dc7c0fe9c7d024</id>
<content type='text'>
* jc/gitignore-ends-with-slash:
  gitignore: lazily find dtype
  gitignore(5): Allow "foo/" in ignore list to match directory "foo"
</content>
</entry>
<entry>
<title>gitignore: lazily find dtype</title>
<updated>2008-02-05T08:46:49Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-02-01T04:23:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6831a88ac03759a8133f10ffd52ad235a081a8a3'/>
<id>urn:sha1:6831a88ac03759a8133f10ffd52ad235a081a8a3</id>
<content type='text'>
When we process "foo/" entries in gitignore files on a system
that does not have d_type member in "struct dirent", the earlier
implementation ran lstat(2) separately when matching with
entries that came from the command line, in-tree .gitignore
files, and $GIT_DIR/info/excludes file.

This optimizes it by delaying the lstat(2) call until it becomes
absolutely necessary.

The initial idea for this change was by Jeff King, but I
optimized it further to pass pointers to around.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>gitignore(5): Allow "foo/" in ignore list to match directory "foo"</title>
<updated>2008-02-05T08:46:49Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-01-31T09:17:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=d6b8fc303b389b026f2bf9918f6f83041488989b'/>
<id>urn:sha1:d6b8fc303b389b026f2bf9918f6f83041488989b</id>
<content type='text'>
A pattern "foo/" in the exclude list did not match directory
"foo", but a pattern "foo" did.  This attempts to extend the
exclude mechanism so that it would while not matching a regular
file or a symbolic link "foo".  In order to differentiate a
directory and non directory, this passes down the type of path
being checked to excluded() function.

A downside is that the recursive directory walk may need to run
lstat(2) more often on systems whose "struct dirent" do not give
the type of the entry; earlier it did not have to do so for an
excluded path, but we now need to figure out if a path is a
directory before deciding to exclude it.  This is especially bad
because an idea similar to the earlier CE_UPTODATE optimization
to reduce number of lstat(2) calls would by definition not apply
to the codepaths involved, as (1) directories will not be
registered in the index, and (2) excluded paths will not be in
the index anyway.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Create pathname-based hash-table lookup into index</title>
<updated>2008-01-23T05:46:30Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-01-23T02:41:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=cf558704fb68514a820e3666968967c900e0fd29'/>
<id>urn:sha1:cf558704fb68514a820e3666968967c900e0fd29</id>
<content type='text'>
This creates a hash index of every single file added to the index.
Right now that hash index isn't actually used for much: I implemented a
"cache_name_exists()" function that uses it to efficiently look up a
filename in the index without having to do the O(logn) binary search,
but quite frankly, that's not why this patch is interesting.

No, the whole and only reason to create the hash of the filenames in the
index is that by modifying the hash function, you can fairly easily do
things like making it always hash equivalent names into the same bucket.

That, in turn, means that suddenly questions like "does this name exist
in the index under an _equivalent_ name?" becomes much much cheaper.

Guiding principles behind this patch:

 - it shouldn't be too costly. In fact, my primary goal here was to
   actually speed up "git commit" with a fully populated kernel tree, by
   being faster at checking whether a file already existed in the index. I
   did succeed, but only barely:

	Best before:
		[torvalds@woody linux]$ time git commit &gt; /dev/null
		real    0m0.255s
		user    0m0.168s
		sys     0m0.088s

	Best after:

		[torvalds@woody linux]$ time ~/git/git commit &gt; /dev/null
		real    0m0.233s
		user    0m0.144s
		sys     0m0.088s

   so some things are actually faster (~8%).

   Caveat: that's really the best case. Other things are invariably going
   to be slightly slower, since we populate that index cache, and quite
   frankly, few things really use it to look things up.

   That said, the cost is really quite small. The worst case is probably
   doing a "git ls-files", which will do very little except puopulate the
   index, and never actually looks anything up in it, just lists it.

	Before:
		[torvalds@woody linux]$ time git ls-files &gt; /dev/null
		real    0m0.016s
		user    0m0.016s
		sys     0m0.000s

	After:
		[torvalds@woody linux]$ time ~/git/git ls-files &gt; /dev/null
		real    0m0.021s
		user    0m0.012s
		sys     0m0.008s

   and while the thing has really gotten relatively much slower, we're
   still talking about something almost unmeasurable (eg 5ms). And that
   really should be pretty much the worst case.

   So we lose 5ms on one "benchmark", but win 22ms on another. Pick your
   poison - this patch has the advantage that it will _likely_ speed up
   the cases that are complex and expensive more than it slows down the
   cases that are already so fast that nobody cares. But if you look at
   relative speedups/slowdowns, it doesn't look so good.

 - It should be simple and clean

   The code may be a bit subtle (the reasons I do hash removal the way I
   do etc), but it re-uses the existing hash.c files, so it really is
   fairly small and straightforward apart from a few odd details.

Now, this patch on its own doesn't really do much, but I think it's worth
looking at, if only because if done correctly, the name hashing really can
make an improvement to the whole issue of "do we have a filename that
looks like this in the index already". And at least it gets real testing
by being used even by default (ie there is a real use-case for it even
without any insane filesystems).

NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just
not ashamed of it enough to really care. I took all the numbers out of my
nether regions - I'm sure it's good enough that it works in practice, but
the whole point was that you can make a really much fancier hash that
hashes characters not directly, but by their upper-case value or something
like that, and thus you get a case-insensitive hash, while still keeping
the name and the index itself totally case sensitive.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
