<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/diffcore-pickaxe.c, branch v2.26.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2019-01-14T23:29:31Z</updated>
<entry>
<title>Merge branch 'tb/log-G-binary'</title>
<updated>2019-01-14T23:29:31Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-01-14T23:29:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=ecdc7cbbac04241f1e5a90fb1fff13cff2ed7a82'/>
<id>urn:sha1:ecdc7cbbac04241f1e5a90fb1fff13cff2ed7a82</id>
<content type='text'>
"git log -G&lt;regex&gt;" looked for a hunk in the "git log -p" patch
output that contained a string that matches the given pattern.
Optimize this code to ignore binary files, which by default will
not show any hunk that would match any pattern (unless textconv or
the --text option is in effect, that is).

* tb/log-G-binary:
  log -G: ignore binary files
</content>
</entry>
<entry>
<title>Merge branch 'nd/the-index'</title>
<updated>2019-01-04T21:33:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-01-04T21:33:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=cde555480b95c4311819dc1f7a38cc856a9aed23'/>
<id>urn:sha1:cde555480b95c4311819dc1f7a38cc856a9aed23</id>
<content type='text'>
More codepaths become aware of working with in-core repository
instance other than the default "the_repository".

* nd/the-index: (22 commits)
  rebase-interactive.c: remove the_repository references
  rerere.c: remove the_repository references
  pack-*.c: remove the_repository references
  pack-check.c: remove the_repository references
  notes-cache.c: remove the_repository references
  line-log.c: remove the_repository reference
  diff-lib.c: remove the_repository references
  delta-islands.c: remove the_repository references
  cache-tree.c: remove the_repository references
  bundle.c: remove the_repository references
  branch.c: remove the_repository reference
  bisect.c: remove the_repository reference
  blame.c: remove implicit dependency the_repository
  sequencer.c: remove implicit dependency on the_repository
  sequencer.c: remove implicit dependency on the_index
  transport.c: remove implicit dependency on the_index
  notes-merge.c: remove implicit dependency the_repository
  notes-merge.c: remove implicit dependency on the_index
  list-objects.c: reduce the_repository references
  list-objects-filter.c: remove implicit dependency on the_index
  ...
</content>
</entry>
<entry>
<title>log -G: ignore binary files</title>
<updated>2018-12-26T22:59:37Z</updated>
<author>
<name>Thomas Braun</name>
<email>thomas.braun@virtuell-zuhause.de</email>
</author>
<published>2018-12-14T18:49:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e0e7cb8080cb4a0b3a24f288bf363fe517b49790'/>
<id>urn:sha1:e0e7cb8080cb4a0b3a24f288bf363fe517b49790</id>
<content type='text'>
The -G&lt;regex&gt; option of log looks for the differences whose patch text
contains added/removed lines that match regex.

Currently -G looks also into patches of binary files (which
according to [1]) is binary as well.

This has a couple of issues:

- It makes the pickaxe search slow. In a proprietary repository of the
  author with only ~5500 commits and a total .git size of ~300MB
  searching takes ~13 seconds

    $time git log -Gwave &gt; /dev/null

    real    0m13,241s
    user    0m12,596s
    sys     0m0,644s

  whereas when we ignore binary files with this patch it takes ~4s

    $time ~/devel/git/git log -Gwave &gt; /dev/null

    real    0m3,713s
    user    0m3,608s
    sys     0m0,105s

  which is a speedup of more than fourfold.

- The internally used algorithm for generating patch text is based on
  xdiff and its states in [1]

  &gt; The output format of the binary patch file is proprietary
  &gt; (and binary) and it is basically a collection of copy and insert
  &gt; commands [..]

  which means that the current format could change once the internal
  algorithm is changed as the format is not standardized. In addition
  the git binary patch format used for preparing patches for git apply
  is *different* from the xdiff format as can be seen by comparing

  git log -p -a

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor &lt;author@example.com&gt;
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84..edfeb6f 100644
    --- a/data.bin
    +++ b/data.bin
    @@ -1,2 +1,4 @@
     a
     a^@a
    +a
    +a^@a

  with git log --binary

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor &lt;author@example.com&gt;
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..]
    GIT binary patch
    literal 12
    QcmYe~N@Pgn0zx1O01)N^ZvX%Q

    literal 6
    NcmYe~N@Pgn0ssWg0XP5v

  which seems unexpected.

To resolve these issues this patch makes -G&lt;regex&gt; ignore binary files
by default. Textconv filters are supported and also -a/--text for
getting the old and broken behaviour back.

The -S&lt;block of text&gt; option of log looks for differences that changes
the number of occurrences of the specified block of text (i.e.
addition/deletion) in a file. As we want to keep the current behaviour,
add a test to ensure it stays that way.

[1]: http://www.xmailserver.org/xdiff.html

Signed-off-by: Thomas Braun &lt;thomas.braun@virtuell-zuhause.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/xdiff-interface'</title>
<updated>2018-11-13T13:37:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2018-11-13T13:37:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=39d23dfa409e51844ddda99599927092c8300f12'/>
<id>urn:sha1:39d23dfa409e51844ddda99599927092c8300f12</id>
<content type='text'>
The interface into "xdiff" library used to discover the offset and
size of a generated patch hunk by first formatting it into the
textual hunk header "@@ -n,m +k,l @@" and then parsing the numbers
out.  A new interface has been introduced to allow callers a more
direct access to them.

* jk/xdiff-interface:
  xdiff-interface: drop parse_hunk_header()
  range-diff: use a hunk callback
  diff: convert --check to use a hunk callback
  combine-diff: use an xdiff hunk callback
  diff: use hunk callback for word-diff
  diff: discard hunk headers for patch-ids earlier
  diff: avoid generating unused hunk header lines
  xdiff-interface: provide a separate consume callback for hunks
  xdiff: provide a separate emit callback for hunks
</content>
</entry>
<entry>
<title>notes-cache.c: remove the_repository references</title>
<updated>2018-11-12T05:50:06Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-11-10T05:49:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=bd7ad45b64bf8b5a5256bd1eeb3907f8516244a8'/>
<id>urn:sha1:bd7ad45b64bf8b5a5256bd1eeb3907f8516244a8</id>
<content type='text'>
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: avoid generating unused hunk header lines</title>
<updated>2018-11-05T04:14:35Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2018-11-02T06:36:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3b40a090fd4e441e88897dfa96f50039952ed45b'/>
<id>urn:sha1:3b40a090fd4e441e88897dfa96f50039952ed45b</id>
<content type='text'>
Some callers of xdi_diff_outf() do not look at the generated hunk header
lines at all. By plugging in a no-op hunk callback, this tells xdiff not
to even bother formatting them.

This patch introduces a stock no-op callback and uses it with a few
callers whose line callbacks explicitly ignore hunk headers (because
they look only for +/- lines).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>xdiff-interface: provide a separate consume callback for hunks</title>
<updated>2018-11-02T11:43:02Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2018-11-02T06:35:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9346d6d14dddc7989ba879839d58f6c2426cffbb'/>
<id>urn:sha1:9346d6d14dddc7989ba879839d58f6c2426cffbb</id>
<content type='text'>
The previous commit taught xdiff to optionally provide the hunk header
data to a specialized callback. But most users of xdiff actually use our
more convenient xdi_diff_outf() helper, which ensures that our callbacks
are always fed whole lines.

Let's plumb the special hunk-callback through this interface, too. It
will follow the same rule as xdiff when the hunk callback is NULL (i.e.,
continue to pass a stringified hunk header to the line callback). Since
we add NULL to each caller, there should be no behavior change yet.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff.c: remove implicit dependency on the_index</title>
<updated>2018-09-21T16:50:58Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-09-21T15:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=acd00ea04998ce469d1775c658134b097e18f5a3'/>
<id>urn:sha1:acd00ea04998ce469d1775c658134b097e18f5a3</id>
<content type='text'>
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]

Helped-by: Ramsay Jones &lt;ramsay@ramsayjones.plus.com&gt;
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff.c: remove the_index dependency in textconv() functions</title>
<updated>2018-09-21T16:48:10Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2018-09-21T15:57:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6afaf807859bd671a3f8e9101952e648a1a5e1a9'/>
<id>urn:sha1:6afaf807859bd671a3f8e9101952e648a1a5e1a9</id>
<content type='text'>
Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>regex: do not call `regfree()` if compilation fails</title>
<updated>2018-05-21T04:58:32Z</updated>
<author>
<name>Martin Ågren</name>
<email>martin.agren@gmail.com</email>
</author>
<published>2018-05-20T10:50:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=17154b1576f8be5aa3044f3adadb5473da618a42'/>
<id>urn:sha1:17154b1576f8be5aa3044f3adadb5473da618a42</id>
<content type='text'>
It is apparently undefined behavior to call `regfree()` on a regex where
`regcomp()` failed. The language in [1] is a bit muddy, at least to me,
but the clearest hint is this (`preg` is the `regex_t *`):

    Upon successful completion, the regcomp() function shall return 0.
    Otherwise, it shall return an integer value indicating an error as
    described in &lt;regex.h&gt;, and the content of preg is undefined.

Funnily enough, there is also the `regerror()` function which should be
given a pointer to such a "failed" `regex_t` -- the content of which
would supposedly be undefined -- and which may investigate it to come up
with a detailed error message.

In any case, the example in that document shows how `regfree()` is not
called after `regcomp()` fails.

We have quite a few users of this API and most get this right. These
three users do not.

Several implementations can handle this just fine [2] and these code paths
supposedly have not wreaked havoc or we'd have heard about it. (These
are all in code paths where git got bad input and is just about to die
anyway.) But let's just avoid the issue altogether.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html

[2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.html

Researched-by: Eric Sunshine &lt;sunshine@sunshineco.com&gt;
Signed-off-byi Martin Ågren &lt;martin.agren@gmail.com&gt;

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
