<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/diffcore-rename.c, branch v1.8.0.2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.8.0.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.8.0.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2012-08-27T18:54:28Z</updated>
<entry>
<title>Merge branch 'jk/maint-null-in-trees'</title>
<updated>2012-08-27T18:54:28Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2012-08-27T18:54:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3b753148b636be9dc821feebf85cd7f1739b07a1'/>
<id>urn:sha1:3b753148b636be9dc821feebf85cd7f1739b07a1</id>
<content type='text'>
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
</content>
</entry>
<entry>
<title>diff: do not use null sha1 as a sentinel value</title>
<updated>2012-07-29T22:04:32Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2012-07-28T15:03:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e54501004abbd20fa8d813c1e5b82c3b50bb9361'/>
<id>urn:sha1:e54501004abbd20fa8d813c1e5b82c3b50bb9361</id>
<content type='text'>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).

The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.

We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.

This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.

One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>teach diffcore-rename to optionally ignore empty content</title>
<updated>2012-03-23T20:52:49Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2012-03-22T22:52:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=90d43b07687fdc51d1f2fc14948df538dc45584b'/>
<id>urn:sha1:90d43b07687fdc51d1f2fc14948df538dc45584b</id>
<content type='text'>
Our rename detection is a heuristic, matching pairs of
removed and added files with similar or identical content.
It's unlikely to be wrong when there is actual content to
compare, and we already take care not to do inexact rename
detection when there is not enough content to produce good
results.

However, we always do exact rename detection, even when the
blob is tiny or empty. It's easy to get false positives with
an empty blob, simply because it is an obvious content to
use as a boilerplate (e.g., when telling git that an empty
directory is worth tracking via an empty .gitignore).

This patch lets callers specify whether or not they are
interested in using empty files as rename sources and
destinations. The default is "yes", keeping the original
behavior. It works by detecting the empty-blob sha1 for
rename sources and destinations.

One more flexible alternative would be to allow the caller
to specify a minimum size for a blob to be "interesting" for
rename detection. But that would catch small boilerplate
files, not large ones (e.g., if you had the GPL COPYING file
in many directories).

A better alternative would be to allow a "-rename"
gitattribute to allow boilerplate files to be marked as
such. I'll leave the complexity of that solution until such
time as somebody actually wants it. The complaints we've
seen so far revolve around empty files, so let's start with
the simple thing.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'mz/maint-rename-unmerged'</title>
<updated>2011-05-02T22:58:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2011-05-02T22:58:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2db8926236406a4e4cb17d1b9c4b791706fb0512'/>
<id>urn:sha1:2db8926236406a4e4cb17d1b9c4b791706fb0512</id>
<content type='text'>
* mz/maint-rename-unmerged:
  diffcore-rename: don't consider unmerged path as source
</content>
</entry>
<entry>
<title>diffcore-rename.c: avoid set-but-not-used warning</title>
<updated>2011-04-29T15:20:10Z</updated>
<author>
<name>Jim Meyering</name>
<email>jim@meyering.net</email>
</author>
<published>2011-04-29T09:42:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=dabdbee10bd6c7d92204ce4bfc1f3ffd60a4046a'/>
<id>urn:sha1:dabdbee10bd6c7d92204ce4bfc1f3ffd60a4046a</id>
<content type='text'>
Since 9d8a5a5 (diffcore-rename: refactor "too many candidates" logic,
2011-01-06), diffcore_rename() initializes num_src but does not use it
anymore.  "-Wunused-but-set-variable" in gcc-4.6 complains about this.

Signed-off-by: Jim Meyering &lt;meyering@redhat.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: don't consider unmerged path as source</title>
<updated>2011-03-24T05:44:22Z</updated>
<author>
<name>Martin von Zweigbergk</name>
<email>martin.von.zweigbergk@gmail.com</email>
</author>
<published>2011-03-24T02:41:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=d7c9bf22351e39756f93f09b4251a6b5861d9cc0'/>
<id>urn:sha1:d7c9bf22351e39756f93f09b4251a6b5861d9cc0</id>
<content type='text'>
Since e9c8409 (diff-index --cached --raw: show tree entry on the LHS for
unmerged entries., 2007-01-05), an unmerged entry should be detected by
using DIFF_PAIR_UNMERGED(p), not by noticing both one and two sides of
the filepair records mode=0 entries. However, it forgot to update some
parts of the rename detection logic.

This only makes difference in the "diff --cached" codepath where an
unmerged filepair carries information on the entries that came from the
tree.  It probably hasn't been noticed for a long time because nobody
would run "diff -M" during a conflict resolution, but "git status" uses
rename detection when it internally runs "diff-index" and "diff-files"
and gives nonsense results.

In an unmerged pair, "one" side can have a valid filespec to record the
tree entry (e.g. what's in HEAD) when running "diff --cached". This can
be used as a rename source to other paths in the index that are not
unmerged. The path that is unmerged by definition does not have the
final content yet (i.e. "two" side cannot have a valid filespec), so it
can never be a rename destination.

Use the DIFF_PAIR_UNMERGED() to detect unmerged filepair correctly, and
allow the valid "one" side of an unmerged filepair to be considered a
potential rename source, but never to be considered a rename destination.

Commit message and first two test cases by Junio, the rest by Martin.

Signed-off-by: Martin von Zweigbergk &lt;martin.von.zweigbergk@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: fall back to -C when -C -C busts the rename limit</title>
<updated>2011-03-22T21:29:07Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2011-01-06T21:50:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f31027c99cb2ec4eb7ad8d1ebc7f0e20fef4bd1d'/>
<id>urn:sha1:f31027c99cb2ec4eb7ad8d1ebc7f0e20fef4bd1d</id>
<content type='text'>
When there are too many paths in the project, the number of rename source
candidates "git diff -C -C" finds will exceed the rename detection limit,
and no inexact rename detection is performed.  We however could fall back
to "git diff -C" if the number of modified paths is sufficiently small.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: record filepair for rename src</title>
<updated>2011-03-22T21:29:07Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2011-01-06T21:50:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e88d6bc6f90059f6e87f6d51eba83ec15e4eecc9'/>
<id>urn:sha1:e88d6bc6f90059f6e87f6d51eba83ec15e4eecc9</id>
<content type='text'>
This will allow us to later skip unmodified entries added due to "-C -C".
We might also want to do something similar to rename_dst side, but that
would only be for the sake of symmetry and not necessary for this series.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: refactor "too many candidates" logic</title>
<updated>2011-03-22T21:29:07Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2011-01-06T21:50:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9d8a5a50b7e30d85aff3deac2a6994bf0178983d'/>
<id>urn:sha1:9d8a5a50b7e30d85aff3deac2a6994bf0178983d</id>
<content type='text'>
Move the logic to a separate function, to be enhanced by later patches in
the series.

While at it, swap the condition used in the if statement from "if it is
too big then do this" to "if it would fit then do this".

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;

---

 Rebased to 'master' as the logic to use the result of this logic was
 updated recently, together with the addition of eye-candy.
</content>
</entry>
<entry>
<title>Merge branch 'jk/merge-rename-ux'</title>
<updated>2011-03-20T06:23:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2011-03-20T06:23:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=0ce6a51b43815a34d0dbebaa799b32a83d1ea4f9'/>
<id>urn:sha1:0ce6a51b43815a34d0dbebaa799b32a83d1ea4f9</id>
<content type='text'>
* jk/merge-rename-ux:
  pull: propagate --progress to merge
  merge: enable progress reporting for rename detection
  add inexact rename detection progress infrastructure
  commit: stop setting rename limit
  bump rename limit defaults (again)
  merge: improve inexact rename limit warning
</content>
</entry>
</feed>
