diff options
author | René Scharfe <l.s.r@web.de> | 2024-10-20 13:02:32 +0200 |
---|---|---|
committer | Taylor Blau <me@ttaylorr.com> | 2024-10-22 12:45:49 -0400 |
commit | ce025ae4f61e8e32b2ae6589e43e03e60f713f2d (patch) | |
tree | d2b9a99998130fad389b6a4b236bc325387c4857 /commit.c | |
parent | 34b6ce9b30747131b6e781ff718a45328aa887d0 (diff) |
grep: disable lookahead on error
regexec(3) can fail. E.g. on macOS it fails if it is used with an UTF-8
locale to match a valid regex against a buffer containing invalid UTF-8
characters.
git grep has two ways to search for matches in a file: Either it splits
its contents into lines and matches them separately, or it matches the
whole content and figures out line boundaries later. The latter is done
by look_ahead() and it's quicker in the common case where most files
don't contain a match.
Fall back to line-by-line matching if look_ahead() encounters an
regexec(3) error by propagating errors out of patmatch() and bailing out
of look_ahead() if there is one. This way we at least can find matches
in lines that contain only valid characters. That matches the behavior
of grep(1) on macOS.
pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since
we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8
characters gracefully. So implement the fall-back only for regexec(3)
and leave the PCRE2 matching unchanged.
Reported-by: David Gstir <david@sigma-star.at>
Signed-off-by: René Scharfe <l.s.r@web.de>
Tested-by: David Gstir <david@sigma-star.at>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Diffstat (limited to 'commit.c')
0 files changed, 0 insertions, 0 deletions