<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/compat/regex, branch master</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=master</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2025-03-29T00:38:11Z</updated>
<entry>
<title>compat/regex: explicitly mark intentional use of the comma operator</title>
<updated>2025-03-29T00:38:11Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2025-03-27T11:53:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=88c91d7d742b802a8774383641f8d997cfd1cd0c'/>
<id>urn:sha1:88c91d7d742b802a8774383641f8d997cfd1cd0c</id>
<content type='text'>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.

In the `compat/regex/` code, the comma operator is used twice, once to
avoid surrounding two conditional statements with curly brackets, the
other one to increment two counters simultaneously in a `do ... while`
condition.

The first one is replaced with a proper conditional block, surrounded by
curly brackets.

The second one would be harder to replace because the loop contains two
`continue`s. Therefore, the second one is marked as intentional by
casting the value-to-discard to `void`.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Acked-by: Phillip Wood &lt;phillip.wood@dunelm.org.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>compat/regex: explicitly ignore "-Wsign-compare" warnings</title>
<updated>2024-12-06T11:20:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6e1d0ce47014f2d8434c54ef70dc9b43602652a5'/>
<id>urn:sha1:6e1d0ce47014f2d8434c54ef70dc9b43602652a5</id>
<content type='text'>
Explicitly ignore "-Wsign-compare" warnings in our bundled copy of the
regcomp implementation. We don't use the macro introduced in the
preceding commit because this code does not include "git-compat-util.h"
in the first place.

Note that we already directly use "#pragma GCC diagnostic ignored" in
"regcomp.c", so it shouldn't be an issue to use it directly in the new
spot, either.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>compat: fix typos</title>
<updated>2024-10-10T20:31:12Z</updated>
<author>
<name>Andrew Kreimer</name>
<email>algonell@gmail.com</email>
</author>
<published>2024-10-10T15:11:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=54ee29cfd521b0e043dcc1857e4a7c8d993b5e59'/>
<id>urn:sha1:54ee29cfd521b0e043dcc1857e4a7c8d993b5e59</id>
<content type='text'>
Fix typos and grammar.

Reported-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Andrew Kreimer &lt;algonell@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>compat: disable -Wunused-parameter in 3rd-party code</title>
<updated>2024-08-28T16:51:18Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-08-28T03:58:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=4550c16434017b2bcec50967845c044fbcbf0ff6'/>
<id>urn:sha1:4550c16434017b2bcec50967845c044fbcbf0ff6</id>
<content type='text'>
We carry some vendored 3rd-party code in compat/ that does not build
cleanly with -Wunused-parameters. We could mark these with UNUSED, but
there are two reasons not to:

  1. This is code imported from elsewhere, so we'd prefer to avoid
     modifying it in an invasive way that could create conflicts if we
     tried to pull in a new version.

  2. These files don't include git-compat-util.h at all, so we'd need to
     factor out (or repeat) our UNUSED macro.

In theory we could modify the build process to invoke the compiler with
the extra warning disabled for these files, but there are tricky corner
cases there (e.g., for NO_REGEX we cannot assume that the compiler
understands -Wno-unused-parameter as an option, so we'd have to use our
detect-compiler script).

Instead, let's rely on the gcc diagnostic #pragma. This is horribly
unportable, of course, but it should do what we want.  Compilers which
don't understand this particular pragma should ignore it (per the
standard), and compilers which do care about "-Wunused-parameter" will
hopefully respect it, even if they are not gcc (e.g., clang does).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: improve const correctness when assigning string constants</title>
<updated>2024-06-07T17:30:48Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-07T06:37:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b567004b4b43f9b0d88aa1f0b15698eae8f15836'/>
<id>urn:sha1:b567004b4b43f9b0d88aa1f0b15698eae8f15836</id>
<content type='text'>
We're about to enable `-Wwrite-strings`, which changes the type of
string constants to `const char[]`. Fix various sites where we assign
such constants to non-const variables.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>compat/regex: fix argument order to calloc(3)</title>
<updated>2024-05-13T17:19:08Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-05-12T06:25:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f01301aabe1771473b7c97fec72a566963c5fe34'/>
<id>urn:sha1:f01301aabe1771473b7c97fec72a566963c5fe34</id>
<content type='text'>
Windows compiler suddenly started complaining that calloc(3) takes
its arguments in &lt;nmemb, size&gt; order.  Indeed, there are many calls
that has their arguments in a _wrong_ order.

Fix them all.

A sample breakage can be seen at

  https://github.com/git/git/actions/runs/9046793153/job/24857988702#step:4:272

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>compat/regex: move stdlib.h up in inclusion chain</title>
<updated>2020-04-27T18:21:16Z</updated>
<author>
<name>Đoàn Trần Công Danh</name>
<email>congdanhqx@gmail.com</email>
</author>
<published>2020-04-27T14:22:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3bc1f9e48cb9640c901d19f193f4417ca805a86a'/>
<id>urn:sha1:3bc1f9e48cb9640c901d19f193f4417ca805a86a</id>
<content type='text'>
In Linux with musl libc, we have this inclusion chain:

compat/regex/regex.c:69
`-&gt; compat/regex/regex_internal.h
   `-&gt; /usr/include/stdlib.h
      `-&gt; /usr/include/features.h
      `-&gt; /usr/include/alloca.h

In that inclusion chain, `&lt;features.h&gt;` claims it's _BSD_SOURCE
compatible when it's NOT asked to be either
{_POSIX,_GNU,_XOPEN,_BSD}_SOURCE, or __STRICT_ANSI__.
And, `&lt;stdlib.h&gt;` will include `&lt;alloca.h&gt;` to be compatible with
software written for GNU and BSD. Thus, redefine `alloca` macro,
which was defined before at compat/regex/regex.c:66.

Considering this is only compat code, we've taken from other project,
it's not our business to decide which source should we adhere to.

Include `&lt;stdlib.h&gt;` early to prevent the redefinition of alloca.
This also remove a potential warning about alloca not defined on:
	#undef alloca

Helped-by: Ramsay Jones &lt;ramsay@ramsayjones.plus.com&gt;
Signed-off-by: Đoàn Trần Công Danh &lt;congdanhqx@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/asan-build-fix'</title>
<updated>2020-01-30T22:17:09Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-01-30T22:17:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=808dab2b58bdf8fb28ad932f32659ede97bb2387'/>
<id>urn:sha1:808dab2b58bdf8fb28ad932f32659ede97bb2387</id>
<content type='text'>
Work around test breakages caused by custom regex engine used in
libasan, when address sanitizer is used with more recent versions
of gcc and clang.

* jk/asan-build-fix:
  Makefile: use compat regex with SANITIZE=address
</content>
</entry>
<entry>
<title>Makefile: use compat regex with SANITIZE=address</title>
<updated>2020-01-16T22:19:39Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-01-16T17:51:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f65d07fffaadc556b1a18cda7bb611c7f68717ea'/>
<id>urn:sha1:f65d07fffaadc556b1a18cda7bb611c7f68717ea</id>
<content type='text'>
Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:

  make CC=gcc-10 SANITIZE=address test

results in failures in t4018, t3206, and t4062.

The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.

The failure in t4062 is an ASan warning:

  expecting success of 4062.2 '-G matches':
  	git diff --name-only -G "^(0{64}){64}$" HEAD^ &gt;out &amp;&amp;
  	test 4096-zeroes.txt = "$(cat out)"

  =================================================================
  ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
  READ of size 4097 at 0x7fa76f672000 thread T0
      #0 0x7fa7726f75b5  (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
      #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
      #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
      #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
      [...]

In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:

  diff --git a/diff.c b/diff.c
  index 8e2914c031..cfae60c120 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
           */
          if (ce_uptodate(ce) ||
              (!lstat(name, &amp;st) &amp;&amp; !ie_match_stat(istate, ce, &amp;st, 0)))
  -               return 1;
  +               return 0;

          return 0;
   }

to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.

The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:

  diff --git a/xdiff-interface.c b/xdiff-interface.c
  index 8509f9ea22..f6c3dc1986 100644
  --- a/xdiff-interface.c
  +++ b/xdiff-interface.c
  @@ -197,6 +197,7 @@ struct ff_regs {
   	struct ff_reg {
   		regex_t re;
   		int negate;
  +		char *printable;
   	} *array;
   };

  @@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,

   	for (i = 0; i &lt; regs-&gt;nr; i++) {
   		struct ff_reg *reg = regs-&gt;array + i;
  -		if (!regexec_buf(&amp;reg-&gt;re, line, len, 2, pmatch, 0)) {
  +		int ret = regexec_buf(&amp;reg-&gt;re, line, len, 2, pmatch, 0);
  +		warning("regexec %s:\n  regex: %s\n  buf: %.*s",
  +			ret == 0 ? "matched" : "did not match",
  +			reg-&gt;printable,
  +			(int)len, line);
  +		if (!ret) {
   			if (reg-&gt;negate)
   				return -1;
   			break;
  @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
   			expression = value;
   		if (regcomp(&amp;reg-&gt;re, expression, cflags))
   			die("Invalid regexp to look for hunk header: %s", expression);
  +		reg-&gt;printable = xstrdup(expression);
   		free(buffer);
   		value = ep + 1;
   	}

then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:

  $ git diff -U1 cpp-skip-access-specifiers
  warning: regexec did not match:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: private:
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ private:
          void DoSomething();
          int ChangeMe;
  };
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.

The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:

  warning: regexec matched:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  [...more lines that we end up not using...]
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: class RIGHT : public Baseclass
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ class RIGHT : public Baseclass
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.

Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!

We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix spelling errors in no-longer-updated-from-upstream modules</title>
<updated>2019-11-10T07:00:55Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-11-05T17:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=03670c8b2375062a5bcc50ce8891a4406fbc88e2'/>
<id>urn:sha1:03670c8b2375062a5bcc50ce8891a4406fbc88e2</id>
<content type='text'>
We have several modules originally taken from some upstream source,
and which as far as I can tell we no longer update from the upstream
anymore.  As such, I have not submitted these spelling fixes to any
external projects but just include them directly here.

Reported-by: Jens Schleusener &lt;Jens.Schleusener@fossies.org&gt;
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
