<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/grep.c, branch v2.26.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2020-01-17T21:52:14Z</updated>
<entry>
<title>grep: replace grep_read_mutex by internal obj read lock</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=1d1729caebd41b340dd8dd61057f613da4df526c'/>
<id>urn:sha1:1d1729caebd41b340dd8dd61057f613da4df526c</id>
<content type='text'>
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().

Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:

  In grep.c:

  - fill_textconv() at fill_textconv_grep().
  - userdiff_get_textconv() at grep_source_1().

  In builtin/grep.c:

  - parse_object_or_die() and the submodule functions at
    grep_submodule().
  - deref_tag() and gitmodules_config_oid() at grep_objects().

If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.

Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: fix race conditions on userdiff calls</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=c3a5bb31c1567edd1671b54726d4acbb10563a2c'/>
<id>urn:sha1:c3a5bb31c1567edd1671b54726d4acbb10563a2c</id>
<content type='text'>
git-grep uses an internal grep_read_mutex to protect object reading
operations. Similarly, there's a grep_attr_mutex to protect calls to the
gitattributes machinery. However, two of the three functions protected
by the last mutex may also perform object reading, as seen below:

- userdiff_get_textconv() &gt; notes_cache_init() &gt;
  notes_cache_match_validity() &gt; lookup_commit_reference_gently() &gt;
  parse_object() &gt; repo_has_object_file() &gt;
  repo_has_object_file_with_flags() &gt; oid_object_info_extended()

- userdiff_find_by_path() &gt; git_check_attr() &gt; collect_some_attrs() &gt;
  prepare_attr_stack() &gt; read_attr() &gt; read_attr_from_index() &gt;
  read_blob_data_from_index() &gt; read_object_file()

As these calls are not protected by grep_read_mutex, there might be race
conditions with other threads performing object reading (e.g. threads
calling fill_textconv() at grep.c:fill_textconv_grep()). To prevent
that, let's make sure to acquire the lock before both of these calls.

Note: this patch might slow down the threaded grep in worktree, for the
sake of thread-safeness. However, in the following patches, we should
regain performance by replacing grep_read_mutex for an internal object
reading lock and allowing parallel inflation during object reading.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: don't return an expression from pcre2_free()</title>
<updated>2019-11-30T22:06:58Z</updated>
<author>
<name>Hans Jerry Illikainen</name>
<email>hji@dyntopia.com</email>
</author>
<published>2019-11-27T20:24:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=867fc7f31010f668e18caa3e0825a36c308d3e3b'/>
<id>urn:sha1:867fc7f31010f668e18caa3e0825a36c308d3e3b</id>
<content type='text'>
Previously, the void pcre2_free() function in grep.c returned free().
While free() itself is void, afaict it's still an expression as per
section A.2.3, subsection 6.8.6 (jump-statement) in both C99 [1] and C11
[2]:

&gt; return expression

Section 6.8.6.4 in C99 [1] and C11 [2] says that:

&gt; A return statement with an expression shall not appear in a function
&gt; whose return type is void.

The consequence of the old behavior was that developer builds with
pedantic errors enabled broke Git if PCRE2 was enabled and a
smart-enough compiler to detect these errors was used.  This commit
fixes pedantic builds of Git that enables --with-libpcre.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

Signed-off-by: Hans Jerry Illikainen &lt;hji@dyntopia.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'cb/pcre2-chartables-leakfix'</title>
<updated>2019-10-23T05:43:11Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-23T05:43:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e0ff2d4c7ec338e30ea5e0340cda7f5fe8a187dc'/>
<id>urn:sha1:e0ff2d4c7ec338e30ea5e0340cda7f5fe8a187dc</id>
<content type='text'>
Leakfix.

* cb/pcre2-chartables-leakfix:
  grep: avoid leak of chartables in PCRE2
  grep: make PCRE2 aware of custom allocator
  grep: make PCRE1 aware of custom allocator
</content>
</entry>
<entry>
<title>grep: avoid leak of chartables in PCRE2</title>
<updated>2019-10-18T01:33:18Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2019-10-16T12:10:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=10da030ab757c38a01507bf18e2484698000b791'/>
<id>urn:sha1:10da030ab757c38a01507bf18e2484698000b791</id>
<content type='text'>
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced
a small memory leak visible with valgrind in t7813.

Complete the creation of a PCRE2 specific variable that was missing from
the original change and free the generated table just like it is done
for PCRE1.

Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: make PCRE2 aware of custom allocator</title>
<updated>2019-10-18T01:33:18Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2019-10-16T12:10:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=513f2b0bbd47015640723a10210c527839e8946a'/>
<id>urn:sha1:513f2b0bbd47015640723a10210c527839e8946a</id>
<content type='text'>
94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include
a way to override the system allocator, and so it is incompatible with
custom allocators (e.g. nedmalloc). This problem became obvious when we
tried to plug a memory leak by `free()`ing a data structure allocated by
PCRE2, triggering a segfault in Windows (where we use nedmalloc by
default).

PCRE2 requires the use of a general context to override the allocator
and therefore, there is a lot more code needed than in PCRE1, including
a couple of wrapper functions.

Extend the grep API with a "destructor" that could be called to cleanup
any objects that were created and used globally.

Update `builtin/grep.c` to use that new API, but any other future users
should make sure to have matching `grep_init()`/`grep_destroy()` calls
if they are using the pattern matching functionality.

Move some of the logic that was before done per thread (in the workers)
into an earlier phase to avoid degrading performance, but as the use of
PCRE2 with custom allocators is better understood it is expected more of
its functions will be instructed to use the custom allocator as well as
was done in the original code[1] this work was based on.

[1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/

Reported-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>grep: make PCRE1 aware of custom allocator</title>
<updated>2019-10-18T01:33:16Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2019-10-16T12:10:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=57d4660468ef2674bb6afc2c5814d4b397a5eda9'/>
<id>urn:sha1:57d4660468ef2674bb6afc2c5814d4b397a5eda9</id>
<content type='text'>
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) didn't include a way
to override the system alocator, and so it is incompatible with
USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2)

Note that nedmalloc, as well as other custom allocators like jemalloc
and mi-malloc, can be configured at runtime (via `LD_PRELOAD`),
therefore we cannot know at compile time whether a custom allocator is
used or not.

Make the minimum change possible to ensure this combination is supported
by extending `grep_init()` to set the PCRE1 specific functions to Git's
idea of `malloc()` and `free()` and therefore making sure all
allocations are done inside PCRE1 with the same allocator than the rest
of Git.

This change has negligible performance impact: PCRE needs to allocate
memory once per program run for the character table and for each pattern
compilation. These are both rare events compared to matching patterns
against lines. Actual measurements[2] show that the impact is lost in
the noise.

[1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com
[2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de

Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'cb/pcre1-cleanup'</title>
<updated>2019-10-11T05:24:47Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-11T05:24:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=93424f1f7d790944168d46faf5222a388898d031'/>
<id>urn:sha1:93424f1f7d790944168d46faf5222a388898d031</id>
<content type='text'>
PCRE fixes.

* cb/pcre1-cleanup:
  grep: refactor and simplify PCRE1 support
  grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1
</content>
</entry>
<entry>
<title>Merge branch 'ab/pcre-jit-fixes'</title>
<updated>2019-10-11T05:24:47Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-11T05:24:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=a73f91774cbfb6f31bca328ebf200498fe92d97a'/>
<id>urn:sha1:a73f91774cbfb6f31bca328ebf200498fe92d97a</id>
<content type='text'>
A few simplification and bugfixes to PCRE interface.

* ab/pcre-jit-fixes:
  grep: under --debug, show whether PCRE JIT is enabled
  grep: do not enter PCRE2_UTF mode on fixed matching
  grep: stess test PCRE v2 on invalid UTF-8 data
  grep: create a "is_fixed" member in "grep_pat"
  grep: consistently use "p-&gt;fixed" in compile_regexp()
  grep: stop using a custom JIT stack with PCRE v1
  grep: stop "using" a custom JIT stack with PCRE v2
  grep: remove overly paranoid BUG(...) code
  grep: use PCRE v2 for optimized fixed-string search
  grep: remove the kwset optimization
  grep: drop support for \0 in --fixed-strings &lt;pattern&gt;
  grep: make the behavior for NUL-byte in patterns sane
  grep tests: move binary pattern tests into their own file
  grep tests: move "grep binary" alongside the rest
  grep: inline the return value of a function call used only once
  t4210: skip more command-line encoding tests on MinGW
  grep: don't use PCRE2?_UTF8 with "log --encoding=&lt;non-utf8&gt;"
  log tests: test regex backends in "--encode=&lt;enc&gt;" tests
</content>
</entry>
<entry>
<title>Merge branch 'cb/skip-utf8-check-with-pcre1'</title>
<updated>2019-10-06T03:25:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-06T03:25:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f00b57e5bc56042e21384e3af0e77e6855b213df'/>
<id>urn:sha1:f00b57e5bc56042e21384e3af0e77e6855b213df</id>
<content type='text'>
Make sure the grep machinery does not abort when seeing a payload
that is not UTF-8 even when JIT is not in use with PCRE1.

* cb/skip-utf8-check-with-pcre1:
  grep: skip UTF8 checks explicitly
</content>
</entry>
</feed>
