<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/diffcore-rename.c, branch v1.7.3.5</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.7.3.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.7.3.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2010-05-07T16:34:27Z</updated>
<entry>
<title>Add a macro DIFF_QUEUE_CLEAR.</title>
<updated>2010-05-07T16:34:27Z</updated>
<author>
<name>Bo Yang</name>
<email>struggleyb.nku@gmail.com</email>
</author>
<published>2010-05-07T04:52:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9ca5df90615aa3c6b60e1bc8f03db6cae98e816c'/>
<id>urn:sha1:9ca5df90615aa3c6b60e1bc8f03db6cae98e816c</id>
<content type='text'>
Refactor the diff_queue_struct code, this macro help
to reset the structure.

Signed-off-by: Bo Yang &lt;struggleyb.nku@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: reduce memory footprint by freeing blob data early</title>
<updated>2009-11-21T06:13:47Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2009-11-21T06:13:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=809809bb75e8a65ef543ab706aab4791459be95c'/>
<id>urn:sha1:809809bb75e8a65ef543ab706aab4791459be95c</id>
<content type='text'>
After running one round of estimate_similarity(), filespecs on either
side will have populated their cnt_data fields, and we do not need
the blob text anymore.  We used to retain the blob data to optimize
for smaller projects (not freeing the blob data here would mean that
the final output phase would not have to re-read it), but we are
efficient enough without such optimization for smaller projects anyway,
and freeing memory early will help larger projects.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix typos / spelling in comments</title>
<updated>2009-04-23T02:02:12Z</updated>
<author>
<name>Mike Ralphson</name>
<email>mike@abacus.co.uk</email>
</author>
<published>2009-04-17T18:13:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3ea3c215c02dc4a4e7d0881c25b2223540960797'/>
<id>urn:sha1:3ea3c215c02dc4a4e7d0881c25b2223540960797</id>
<content type='text'>
Signed-off-by: Mike Ralphson &lt;mike@abacus.co.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Rename detection: Avoid repeated filespec population</title>
<updated>2009-01-21T08:14:12Z</updated>
<author>
<name>Björn Steinbrink</name>
<email>B.Steinbrink@gmx.de</email>
</author>
<published>2009-01-20T15:59:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=885c716f0f039cfe100f5d761e1011085b43fbb8'/>
<id>urn:sha1:885c716f0f039cfe100f5d761e1011085b43fbb8</id>
<content type='text'>
In diffcore_rename, we assume that the blob contents in the filespec
aren't required anymore after estimate_similarity has been called and thus
we free it. But estimate_similarity might return early when the file sizes
differ too much. In that case, cnt_data is never set and the next call to
estimate_similarity will populate the filespec again, eventually rereading
the same blob over and over again.

To fix that, we first get the blob sizes and only when the blob contents
are actually required, and when cnt_data will be set, the full filespec is
populated, once.

Signed-off-by: Björn Steinbrink &lt;B.Steinbrink@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Add file delete/create info when we overflow rename_limit</title>
<updated>2008-10-28T15:58:42Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2008-10-27T20:06:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6e381d3aff89e09d13bd855ed6e18b0aa6f1e441'/>
<id>urn:sha1:6e381d3aff89e09d13bd855ed6e18b0aa6f1e441</id>
<content type='text'>
When we refuse to do rename detection due to having too many files
created or deleted, let the user know the numbers.  That way there is a
reasonable starting point for setting the diff.renamelimit option.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: make "too many files" rename warning optional</title>
<updated>2008-05-03T20:40:43Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2008-04-30T17:25:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b8960bbe7bdfc0b232462f916ee8151c83afd16f'/>
<id>urn:sha1:b8960bbe7bdfc0b232462f916ee8151c83afd16f</id>
<content type='text'>
In many cases, the warning ends up as clutter, because the
diff is being done "behind the scenes" from the user (e.g.,
when generating a commit diffstat), and whether we show
renames or not is not particularly interesting to the user.

However, in the case of a merge (which is what motivated the
warning in the first place), it is a useful hint as to why a
merge with renames might have failed.

This patch makes the warning optional based on the code
calling into diffcore. We default to not showing the
warning, but turn it on for merges.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jc/rename'</title>
<updated>2008-04-09T08:09:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-04-09T07:46:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2a5fe2545882721d6841bad11dae0f15b454bf0d'/>
<id>urn:sha1:2a5fe2545882721d6841bad11dae0f15b454bf0d</id>
<content type='text'>
* 'jc/rename' (early part):
  Optimize rename detection for a huge diff
</content>
</entry>
<entry>
<title>rename: warn user when we have turned off rename detection</title>
<updated>2008-03-01T09:30:15Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2008-03-01T06:14:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=ee542ee3fc309fa95622b274c09eefbe394cd108'/>
<id>urn:sha1:ee542ee3fc309fa95622b274c09eefbe394cd108</id>
<content type='text'>
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Optimize rename detection for a huge diff</title>
<updated>2008-02-13T23:44:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2008-01-30T04:54:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6d24ad971c8195b00cd9678fbff7c2aaddb00908'/>
<id>urn:sha1:6d24ad971c8195b00cd9678fbff7c2aaddb00908</id>
<content type='text'>
When there are N deleted paths and M created paths, we used to
allocate (N x M) "struct diff_score" that record how similar
each of the pair is, and picked the &lt;src,dst&gt; pair that gives
the best match first, and then went on to process worse matches.

This sorting is done so that when two new files in the postimage
that are similar to the same file deleted from the preimage, we
can process the more similar one first, and when processing the
second one, it can notice "Ah, the source I was planning to say
I am a copy of is already taken by somebody else" and continue
on to match itself with another file in the preimage with a
lessor match.  This matters to a change introduced between
1.5.3.X series and 1.5.4-rc, that lets the code to favor unused
matches first and then falls back to using already used
matches.

This instead allocates and keeps only a handful rename source
candidates per new files in the postimage.  I.e. it makes the
memory requirement from O(N x M) to O(M).

For each dst, we compute similarlity with all sources (i.e. the
number of similarity estimate computations is still O(N x M)),
but we keep handful best src candidates for each dst.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix a pathological case in git detecting proper renames</title>
<updated>2007-11-30T23:49:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2007-11-30T00:41:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9ae8fcb36ac9fde8e048a304cc3717f2c7914e78'/>
<id>urn:sha1:9ae8fcb36ac9fde8e048a304cc3717f2c7914e78</id>
<content type='text'>
On Thu, 29 Nov 2007, Jeff King wrote:
&gt;
&gt; I think it will get worse, because you are simultaneously calculating
&gt; all of the similarity scores bit by bit rather than doing a loop. Though
&gt; perhaps you mean at the end you will end up with a list of src/dst pairs
&gt; sorted by score, and you can loop over that.

Well, after thinking about this a bit, I think there's a solution that may
work well with the current thing too: instead of looping just *once* over
the list of rename pairs, loop twice - and simply refuse to do copies on
the first loop.

This trivial patch does that, and turns Kumar's test-case into a perfect
rename list.

It's not pretty, it's not smart, but it seems to work. There's something
to be said for keeping it simple and stupid.

And it should not be nearly as expensive as it may _look_. Yes, the loop
is "(i = 0; i &lt; num_create * num_src; i++)", but the important part is
that the whole array is sorted by rename score, and we have a

	if (mx[i].score &lt; minimum_score)
		break;

in it, so uthe loop actually would tend to terminate rather quickly.

Anyway, Kumar, the thing to take away from this is:

 - git really doesn't even *care* about the whole "rename detection"
   internally, and any commits you have done with renames are totally
   independent of the heuristics we then use to *show* the renames.

 - the rename detection really is for just two reasons: (a) keep humans
   happy, and keep the diffs small and (b) help automatic merging across
   renames. So getting renames right is certainly good, but it's more of a
   "politeness" issue than a "correctness" issue, although the merge
   portion of it does matter a lot sometimes.

 - the important thing here is that you can commit your changes and not
   worry about them being somehow "corrupted" by lack of rename detection,
   even if you commit them with a version of git that doesn't do rename
   detection the way you expected it. The rename detection is an
   "after-the-fact" thing, not something that actually gets saved in the
   repository, which is why we can change the heuristics _after_ seeing
   examples, and the examples magically correct themselves!

 - try out the two patches I've posted, and see if they work for you. They
   pass the test-suite, and the output for your example commit looks sane,
   but hey, if you have other test-cases, try them out.

Here's Kumar's pretty diffstat with both my patches:

	 Makefile                                         |    6 +++---
	 board/{cds =&gt; freescale}/common/cadmus.c         |    0
	 board/{cds =&gt; freescale}/common/cadmus.h         |    0
	 board/{cds =&gt; freescale}/common/eeprom.c         |    0
	 board/{cds =&gt; freescale}/common/eeprom.h         |    0
	 board/{cds =&gt; freescale}/common/ft_board.c       |    0
	 board/{cds =&gt; freescale}/common/via.c            |    0
	 board/{cds =&gt; freescale}/common/via.h            |    0
	 board/{cds =&gt; freescale}/mpc8541cds/Makefile     |    0
	 board/{cds =&gt; freescale}/mpc8541cds/config.mk    |    0
	 board/{cds =&gt; freescale}/mpc8541cds/init.S       |    0
	 board/{cds =&gt; freescale}/mpc8541cds/mpc8541cds.c |    0
	 board/{cds =&gt; freescale}/mpc8541cds/u-boot.lds   |    4 ++--
	 board/{cds =&gt; freescale}/mpc8548cds/Makefile     |    0
	 board/{cds =&gt; freescale}/mpc8548cds/config.mk    |    0
	 board/{cds =&gt; freescale}/mpc8548cds/init.S       |    0
	 board/{cds =&gt; freescale}/mpc8548cds/mpc8548cds.c |    0
	 board/{cds =&gt; freescale}/mpc8548cds/u-boot.lds   |    4 ++--
	 board/{cds =&gt; freescale}/mpc8555cds/Makefile     |    0
	 board/{cds =&gt; freescale}/mpc8555cds/config.mk    |    0
	 board/{cds =&gt; freescale}/mpc8555cds/init.S       |    0
	 board/{cds =&gt; freescale}/mpc8555cds/mpc8555cds.c |    0
	 board/{cds =&gt; freescale}/mpc8555cds/u-boot.lds   |    4 ++--
	 23 files changed, 9 insertions(+), 9 deletions(-)

and here it is before:

	 Makefile                                           |    6 +-
	 board/cds/mpc8548cds/Makefile                      |   60 -----
	 board/cds/mpc8555cds/Makefile                      |   60 -----
	 board/cds/mpc8555cds/init.S                        |  255 --------------------
	 board/cds/mpc8555cds/u-boot.lds                    |  150 ------------
	 board/{cds =&gt; freescale}/common/cadmus.c           |    0
	 board/{cds =&gt; freescale}/common/cadmus.h           |    0
	 board/{cds =&gt; freescale}/common/eeprom.c           |    0
	 board/{cds =&gt; freescale}/common/eeprom.h           |    0
	 board/{cds =&gt; freescale}/common/ft_board.c         |    0
	 board/{cds =&gt; freescale}/common/via.c              |    0
	 board/{cds =&gt; freescale}/common/via.h              |    0
	 board/{cds =&gt; freescale}/mpc8541cds/Makefile       |    0
	 board/{cds =&gt; freescale}/mpc8541cds/config.mk      |    0
	 board/{cds =&gt; freescale}/mpc8541cds/init.S         |    0
	 board/{cds =&gt; freescale}/mpc8541cds/mpc8541cds.c   |    0
	 board/{cds =&gt; freescale}/mpc8541cds/u-boot.lds     |    4 +-
	 .../mpc8541cds =&gt; freescale/mpc8548cds}/Makefile   |    0
	 board/{cds =&gt; freescale}/mpc8548cds/config.mk      |    0
	 board/{cds =&gt; freescale}/mpc8548cds/init.S         |    0
	 board/{cds =&gt; freescale}/mpc8548cds/mpc8548cds.c   |    0
	 board/{cds =&gt; freescale}/mpc8548cds/u-boot.lds     |    4 +-
	 .../mpc8541cds =&gt; freescale/mpc8555cds}/Makefile   |    0
	 board/{cds =&gt; freescale}/mpc8555cds/config.mk      |    0
	 .../mpc8541cds =&gt; freescale/mpc8555cds}/init.S     |    0
	 board/{cds =&gt; freescale}/mpc8555cds/mpc8555cds.c   |    0
	 .../mpc8541cds =&gt; freescale/mpc8555cds}/u-boot.lds |    4 +-
	 27 files changed, 9 insertions(+), 534 deletions(-)

so it certainly makes the diffs prettier.

		Linus

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
