<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/diffcore-delta.c, branch v1.6.4.5</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.6.4.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.6.4.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2007-10-04T07:05:36Z</updated>
<entry>
<title>optimize diffcore-delta by sorting hash entries.</title>
<updated>2007-10-04T07:05:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2007-10-03T02:28:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=eb4d0e3f4515e5508fa9c0a695f7a45812a76296'/>
<id>urn:sha1:eb4d0e3f4515e5508fa9c0a695f7a45812a76296</id>
<content type='text'>
Here's a test-patch. I don't guarantee anything, except that when I did
the timings I also did a "wc" on the result, and they matched..

Before:
	[torvalds@woody linux]$ time git diff -l0 --stat -C v2.6.22.. | wc
	   7104   28574  438020

	real    0m10.526s
	user    0m10.401s
	sys     0m0.136s

After:
	[torvalds@woody linux]$ time ~/git/git diff -l0 --stat -C v2.6.22.. | wc
	   7104   28574  438020

	real    0m8.876s
	user    0m8.761s
	sys     0m0.128s

but the diff is fairly simple, so if somebody will go over it and say
whether it's likely to be *correct* too, that 15% may well be worth it.

[ Side note, without rename detection, that diff takes just under three
  seconds for me, so in that sense the improvement to the rename detection
  itself is larger than the overall 15% - it brings the cost of just
  rename detection from 7.5s to 5.9s, which would be on the order of just
  over a 20% performance improvement. ]

Hmm. The patch depends on half-way subtle issues like the fact that the
hashtables are guaranteed to not be full =&gt; we're guaranteed to have zero
counts at the end =&gt; we don't need to do any steenking iterator count in
the loop. A few comments might in order.

		Linus
</content>
</entry>
<entry>
<title>Introduce diff_filespec_is_binary()</title>
<updated>2007-07-06T07:21:41Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2007-07-06T07:18:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=29a3eefde111f6a24292163c4308f00ab3572627'/>
<id>urn:sha1:29a3eefde111f6a24292163c4308f00ab3572627</id>
<content type='text'>
This replaces an explicit initialization of filespec-&gt;is_binary
field used for rename/break followed by direct access to that
field with a wrapper function that lazily iniaitlizes and
accesses the field.  We would add more attribute accesses for
the use of diff routines, and it would be better to make this
abstraction earlier.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-delta.c: Ignore CR in CRLF for text files</title>
<updated>2007-07-01T03:51:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2007-06-29T06:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b9905fed7a028cc9749cf8ad479cbb07940c8638'/>
<id>urn:sha1:b9905fed7a028cc9749cf8ad479cbb07940c8638</id>
<content type='text'>
This ignores CR byte in CRLF sequence in text file when
computing similarity of two blobs.

Usually this should not matter as nobody sane would be checking
in a file with CRLF line endings to the repository (they would
use autocrlf so that the repository copy would have LF line
endings).

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-delta.c: update the comment on the algorithm.</title>
<updated>2007-07-01T03:51:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2007-06-29T06:11:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=af3abef94af9c821a0c192c693c3e5342ab8729f'/>
<id>urn:sha1:af3abef94af9c821a0c192c693c3e5342ab8729f</id>
<content type='text'>
The comment at the top of the file described an old algorithm
that was neutral to text/binary differences (it hashed sliding
window of N-byte sequences and counted overlaps), but long time
ago we switched to a new heuristics that are more suitable for
line oriented (read: text) files that are much faster.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore_count_changes: pass diffcore_filespec</title>
<updated>2007-07-01T03:51:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2007-06-29T05:54:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=d8c3d03a0b7f10977dd508a5a965a417b7f1b065'/>
<id>urn:sha1:d8c3d03a0b7f10977dd508a5a965a417b7f1b065</id>
<content type='text'>
We may want to use richer information on the data we are dealing
with in this function, so instead of passing a buffer address
and length, just pass the diffcore_filespec structure.  Existing
callers always call this function with parameters taken from a
filespec anyway, so there is no functionality changes.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-delta: 64-byte-or-EOL ultrafast replacement (hash fix).</title>
<updated>2006-03-15T21:19:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-03-15T08:37:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e31c9f241ae5469c820cde2a54987a1075e52a43'/>
<id>urn:sha1:e31c9f241ae5469c820cde2a54987a1075e52a43</id>
<content type='text'>
The rotating 64-bit number was not really rotating, and worse
yet ulong was longer than 64-bit on 64-bit architectures X-&lt;.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>diffcore-delta: 64-byte-or-EOL ultrafast replacement.</title>
<updated>2006-03-15T08:37:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-03-15T08:37:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3c7ceba4f1e8be1a536dd0e21bc4f422873d08d2'/>
<id>urn:sha1:3c7ceba4f1e8be1a536dd0e21bc4f422873d08d2</id>
<content type='text'>
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>diffcore-delta: tweak hashbase value.</title>
<updated>2006-03-13T04:42:12Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-03-13T04:32:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=fc66d213f8b2f13b9ffd643f01de25ddc95e0972'/>
<id>urn:sha1:fc66d213f8b2f13b9ffd643f01de25ddc95e0972</id>
<content type='text'>
This tweaks the maximum hashvalue we use to hash the string into
without making the maximum size of the hashtable can grow from
the current limit.  With this, the renames detected becomes a
bit more precise without incurring additional paging cost.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>diffcore-delta: make the hash a bit denser.</title>
<updated>2006-03-13T01:26:32Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-03-13T00:39:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2821104db7fabdfac105ae757228b0eac107047c'/>
<id>urn:sha1:2821104db7fabdfac105ae757228b0eac107047c</id>
<content type='text'>
To reduce wasted memory, wait until the hash fills up more
densely before we rehash.  This reduces the working set size a
bit further.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>diffcore-rename: somewhat optimized.</title>
<updated>2006-03-12T11:22:10Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-03-12T11:22:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=c06c79667c9514aed00d29bcd80bd0cee7cc5a25'/>
<id>urn:sha1:c06c79667c9514aed00d29bcd80bd0cee7cc5a25</id>
<content type='text'>
This changes diffcore-rename to reuse statistics information
gathered during similarity estimation, and updates the hashtable
implementation used to keep track of the statistics to be
denser.  This seems to give better performance.

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
</feed>
