summaryrefslogtreecommitdiff
path: root/grep.c
AgeCommit message (Collapse)Author
2025-07-01odb: rename `repo_read_object_file()`Patrick Steinhardt
Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13Merge branch 'ps/leakfixes-part-9'Junio C Hamano
More leakfixes. * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...
2024-11-04grep: fix leak in `grep_splice_or()`Patrick Steinhardt
In `grep_splice_or()` we search for the next `TRUE` node in our tree of grep expressions and replace it with the given new expression. But we don't free the old node, which causes a memory leak. Plug it. This leak is exposed by t7810, but plugging it alone isn't sufficient to make the test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-22grep: disable lookahead on errorRené Scharfe
regexec(3) can fail. E.g. on macOS it fails if it is used with an UTF-8 locale to match a valid regex against a buffer containing invalid UTF-8 characters. git grep has two ways to search for matches in a file: Either it splits its contents into lines and matches them separately, or it matches the whole content and figures out line boundaries later. The latter is done by look_ahead() and it's quicker in the common case where most files don't contain a match. Fall back to line-by-line matching if look_ahead() encounters an regexec(3) error by propagating errors out of patmatch() and bailing out of look_ahead() if there is one. This way we at least can find matches in lines that contain only valid characters. That matches the behavior of grep(1) on macOS. pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8 characters gracefully. So implement the fall-back only for regexec(3) and leave the PCRE2 matching unchanged. Reported-by: David Gstir <david@sigma-star.at> Signed-off-by: René Scharfe <l.s.r@web.de> Tested-by: David Gstir <david@sigma-star.at> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-09-27grep: fix leaking grep patternPatrick Steinhardt
When creating a pattern via `create_grep_pat()` we allocate the pattern member of the structure regardless of the token type. But later, when we try to free the structure, we free the pattern member conditionally on the token type and thus leak memory. Plug this leak. The leak is exposed by t7814, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-29grep: prefer UNUSED to MAYBE_UNUSED for pcre allocatorsJeff King
We provide custom malloc/free callbacks for the pcre library to use. Those take an extra "data" parameter, but we don't use it. Back when these were added in 513f2b0bbd (grep: make PCRE2 aware of custom allocator, 2019-10-16), we only had MAYBE_UNUSED. But these days we have UNUSED, which we should prefer, as it will let the compiler inform us if the code changes to actually use the parameters. I also moved the annotations to come after the variable name, which is how we typically spell it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-30grep: -W: skip trailing empty lines at EOF, tooRené Scharfe
4aa2c4753d (grep: -W: don't extend context to trailing empty lines, 2016-05-28) stopped showing empty lines at the end of function context when using -W. Do the same for trailing empty lines at the end of files, for consistency -- it doesn't matter whether a function section is ended by the next function or the end of the file. Test it by adding a trailing empty line to the file used by the test "grep -W" and leave its expected output the same. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25grep: improve errors for unmatched ( and )Ahelenia Ziemiańska
Imagine you want to grep for (. Easy: $ git grep '(' fatal: unmatched parenthesis uhoh. This is plainly wrong. Unless you know specifically that (a) git grep has expression groups and '(' ... ')' are used for them. (b) you can use -e '(' to explicitly say '(' is what you are looking for, not the beginning of a group. Similarly, $ git grep ')' fatal: incomplete pattern expression: ) is somehow worse. ")" is a complete regular expression pattern. Of course, the error wants to say "group" here. In this case it is also not "incomplete", it is unmatched. Make them say $ ./git grep '(' fatal: unmatched ( for expression group $ ./git grep ')' fatal: incomplete pattern expression group: ) which are clearer in indicating that it is not the expression that is wrong (since no pattern had been parsed at all), but rather that it is been misconstrued as a grouping operator. Link: https://bugs.debian.org/1051205 Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29grep: mark unused parmaeters in pcre fallbacksJeff King
When USE_LIBPCRE2 is not defined, we compile several noop fallbacks. These need to have their parameters annotated to avoid -Wunused-parameter warnings (and obviously we cannot remove the parameters, since the functions must match the non-fallback versions). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29grep: mark unused parameter in output functionJeff King
This is a callback used with grep_options.output, but we don't look at the grep_opt parameter, as we're just writing the output to stdout. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-06Merge branch 'gc/config-context'Junio C Hamano
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-24commit.h: reduce unnecessary includesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove double forward declaration of read_in_fullElijah Newren
cache.h's nature of a dumping ground of includes prevented it from being included in some compat/ files, forcing us into a workaround of having a double forward declaration of the read_in_full() function (see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to fix a warning", 2007-11-17)). Now that we have moved functions like read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered with unrelated and scary #defines, get rid of the extra forward declaration and just have compat/pread.c include wrapper.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-03-30Merge branch 'mk/workaround-pcre-jit-ucp-bug'Junio C Hamano
A recent-ish change to allow unicode character classes to be used with "grep -P" triggered a JIT bug in older pcre2 libraries. The problematic change in Git built with these older libraries has been disabled to work around the bug. * mk/workaround-pcre-jit-ucp-bug: grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34
2023-03-23grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34Mathias Krause
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an older version of libpcre2 that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35. Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit acabd2048ee0 ("grep: correctly identify utf-8 characters with \{b,w} in -P"). [1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/ Reported-by: Stephane Odul <stephane@clumio.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove unnecessary cache.h inclusion from several sourcesElijah Newren
A number of files were apparently including cache.h solely to get gettext.h. By making those files explicitly include gettext.h, we can already drop the include of cache.h in these files. On top of that, there were some files using cache.h that didn't need to for any reason. Remove these unnecessary includes. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-22Merge branch 'ab/various-leak-fixes'Junio C Hamano
Leak fixes. * ab/various-leak-fixes: push: free_refs() the "local_refs" in set_refspecs() push: refactor refspec_append_mapped() for subsequent leak-fix receive-pack: release the linked "struct command *" list grep API: plug memory leaks by freeing "header_list" grep.c: refactor free_grep_patterns() builtin/merge.c: free "&buf" on "Your local changes..." error builtin/merge.c: use fixed strings, not "strbuf", fix leak show-branch: free() allocated "head" before return commit-graph: fix a parse_options_concat() leak http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}() http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main() worktree: fix a trivial leak in prune_worktrees() repack: fix leaks on error with "goto cleanup" name-rev: don't xstrdup() an already dup'd string various: add missing clear_pathspec(), fix leaks clone: use free() instead of UNLEAK() commit-graph: use free_commit_graph() instead of UNLEAK() bundle.c: don't leak the "args" in the "struct child_process" tests: mark tests as passing with SANITIZE=leak
2023-02-15Merge branch 'cb/grep-fallback-failing-jit'Junio C Hamano
In an environment where dynamically generated code is prohibited to run (e.g. SELinux), failure to JIT pcre patterns is expected. Fall back to interpreted execution in such a case. * cb/grep-fallback-failing-jit: grep: fall back to interpreter if JIT memory allocation fails
2023-02-06grep API: plug memory leaks by freeing "header_list"Ævar Arnfjörð Bjarmason
When the "header_list" struct member was added in [1], freeing this field was neglected. Fix that now, so that commands like ./git -P log -1 --color=always --author=A origin/master will run leak-free. 1. 80235ba79ef ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06grep.c: refactor free_grep_patterns()Ævar Arnfjörð Bjarmason
Refactor the free_grep_patterns() function to split out the freeing of the "struct grep_pat" it contains. Right now we're only freeing the "pattern_list", but we should be freeing another member of the same type, which we'll do in the subsequent commit. Let's also replace the "return" if we don't have an "opt->pattern_expression" with a conditional call of free_pattern_expr(). Before db84376f981 (grep.c: remove "extended" in favor of "pattern_expression", fix segfault, 2022-10-11) the pattern here was: if (!x) return; free_pattern_expr(y); While at it, instead of: if (!x) return; free_pattern_expr(x); Let's instead do: if (x) free_pattern_expr(x); This will make it easier to free additional members from free_grep_patterns() in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31grep: fall back to interpreter if JIT memory allocation failsMathias Krause
Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): [user@fedora git]$ git grep -c PCRE2_JIT grep.c:1 [user@fedora git]$ # Enable SELinux's W^X policy [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' [user@fedora git]$ git grep -c PCRE2_JIT fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' Instead of failing hard in this case and making 'git grep' unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience. As having a functional PCRE2 JIT compiler is a legitimate use case for performance reasons, we'll only do the fallback if the supposedly available JIT is found to be non-functional by attempting to JIT compile a very simple pattern. If this fails, JIT is deemed to be non-functional and we do the interpreter fallback. For all other cases, i.e. the simple pattern can be compiled but the user provided cannot, we fail hard as we do now as the reason for the failure must be the pattern itself. To aid users in helping themselves change the error message to include a hint about the '(*NO_JIT)' prefix. Also clip the pattern at 64 characters to ensure the hint will be seen by the user and not internally truncated by the die() function. Cc: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18grep: correctly identify utf-8 characters with \{b,w} in -PCarlo Marcelo Arenas Belón
When UTF is enabled for a PCRE match, the corresponding flags are added to the pcre2_compile() call, but PCRE2_UCP wasn't included. This prevents extending the meaning of the character classes to include those new valid characters and therefore result in failed matches for expressions that rely on that extention, for ex: $ git grep -P '\bÆvar' Add PCRE2_UCP so that \w will include Æ and therefore \b could correctly match the beginning of that word. This has an impact on performance that has been estimated to be between 20% to 40% and that is shown through the added performance test. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21Merge branch 'ab/grep-simplify-extended-expression'Junio C Hamano
Giving "--invert-grep" and "--all-match" without "--grep" to the "git log" command resulted in an attempt to access grep pattern expression structure that has not been allocated, which has been corrected. * ab/grep-simplify-extended-expression: grep.c: remove "extended" in favor of "pattern_expression", fix segfault
2022-10-11grep.c: remove "extended" in favor of "pattern_expression", fix segfaultÆvar Arnfjörð Bjarmason
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching., 2006-06-30) the "pattern_expression" member has been used for complex queries (AND/OR...), with "pattern_list" being used for the simple OR queries. Since then we've used both "pattern_expression" and its associated boolean "extended" member to see if we have a complex expression. Since f41fb662f57 (revisions API: have release_revisions() release "grep_filter", 2022-04-13) we've had a subtle bug relating to that: If we supplied options that were only used for "complex queries", but didn't supply the query itself we'd set "opt->extended", but would have a NULL "pattern_expression". As a result these would segfault as we tried to call "free_grep_patterns()" from "release_revisions()": git -P log -1 --invert-grep git -P log -1 --all-match The root cause of this is that we were conflating the state management we needed in "compile_grep_patterns()" itself with whether or not we had an "opt->pattern_expression" later on. In this cases as we're going through "compile_grep_patterns()" we have no "opt->pattern_list" but have "opt->no_body_match" or "opt->all_match". So we'd set "opt->extended = 1", but not "return" on "opt->extended" as that's an "else if" in the same "if" statement. That behavior is intentional and required, as the common case is that we have an "opt->pattern_list" that we're about to parse into the "opt->pattern_expression". But we don't need to keep track of this "extended" flag beyond the state management in compile_grep_patterns() itself. It needs it, but once we're out of that function we can rely on "opt->pattern_expression" being non-NULL instead for using these extended patterns. As 79d3696cfb4 itself shows we've assumed that there's a one-to-one mapping between the two since the very beginning. I.e. "match_line()" would check "opt->extended" to see if it should call "match_expr()", and the first thing we do in that function is assume that we have a "opt->pattern_expression". We'd then call "match_expr_eval()", which would have died if that "opt->pattern_expression" was NULL. The "die" was added in c922b01f54c (grep: fix segfault when "git grep '('" is given, 2009-04-27), and can now be removed as it's now clearly unreachable. We still do the right thing in the case that prompted that fix: git grep '(' fatal: unmatched parenthesis Arguably neither the "--invert-grep" option added in [1] nor the earlier "--all-match" option added in [2] were intended to be used stand-alone, and another approach[3] would be to error out in those cases. But since we've been treating them as a NOOP when given without --grep for a long time let's keep doing that. We could also return in "free_pattern_expr()" if the argument is non-NULL, as an alternative fix for this segfault does [4]. That would be more elegant in making the "free_*()" function behave like "free()", but it would also remove a sanity check: The "free_pattern_expr()" function calls itself recursively, and only the top-level is allowed to be NULL, let's not conflate those two conditions. 1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12) 2. 0ab7befa31d (grep --all-match, 2006-09-27) 3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/ 4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com Reported-by: orygaw <orygaw@protonmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22grep: add --max-count command line optionCarlos López
This patch adds a command line option analogous to that of GNU grep(1)'s -m / --max-count, which users might already be used to. This makes it possible to limit the amount of matches shown in the output while keeping the functionality of other options such as -C (show code context) or -p (show containing function), which would be difficult to do with a shell pipeline (e.g. head(1)). Signed-off-by: Carlos López 00xc@protonmail.com Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25Merge branch 'rs/pcre-invalid-utf8-fix-fix'Junio C Hamano
Workaround we have for versions of PCRE2 before their version 10.36 were in effect only for their versions newer than 10.36 by mistake, which has been corrected. * rs/pcre-invalid-utf8-fix-fix: grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround
2022-02-25Merge branch 'ab/grep-patterntype'Junio C Hamano
Some code clean-up in the "git grep" machinery. * ab/grep-patterntype: grep: simplify config parsing and option parsing grep.c: do "if (bool && memchr())" not "if (memchr() && bool)" grep.h: make "grep_opt.pattern_type_option" use its enum grep API: call grep_config() after grep_init() grep.c: don't pass along NULL callback value built-ins: trust the "prefix" from run_builtin() grep tests: add missing "grep.patternType" config tests grep tests: create a helper function for "BRE" or "ERE" log tests: check if grep_config() is called by "log"-like cmds grep.h: remove unused "regex_t regexp" from grep_opt
2022-02-17grep: fix triggering PCRE2_NO_START_OPTIMIZE workaroundRené Scharfe
PCRE2 bug 2642 was fixed in version 10.36. Our 95ca1f987e (grep/pcre2: better support invalid UTF-8 haystacks, 2021-01-24) worked around it on older versions by setting the flag PCRE2_NO_START_OPTIMIZE. 797c359978 (grep/pcre2: use compile-time PCREv2 version test, 2021-02-18) switched it around to set the flag on 10.36 and higher instead, while it claimed to use "the same test done at compile-time". Switch the condition back to apply the workaround on PCRE2 versions _before_ 10.36. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep: simplify config parsing and option parsingÆvar Arnfjörð Bjarmason
Simplify the parsing of "grep.patternType" and "grep.extendedRegexp". This changes no behavior, but gets rid of complex parsing logic that isn't needed anymore. When "grep.patternType" was introduced in 84befcd0a4a (grep: add a grep.patternType configuration setting, 2012-08-03) we promised that: 1. You can set "grep.patternType", and "[setting it to] 'default' will return to the default matching behavior". In that context "the default" meant whatever the configuration system specified before that change, i.e. via grep.extendedRegexp. 2. We'd support the existing "grep.extendedRegexp" option, but ignore it when the new "grep.patternType" option is set. We said we'd only ignore the older "grep.extendedRegexp" option "when the `grep.patternType` option is set to a value other than 'default'". In a preceding commit we changed grep_config() to be called after grep_init(), which means that much of the complexity here can go away. As before both "grep.patternType" and "grep.extendedRegexp" are last-one-wins variable, with "grep.extendedRegexp" yielding to "grep.patternType", except when "grep.patternType=default". Note that as the previously added tests indicate this cannot be done on-the-fly as we see the config variables, without introducing more state keeping. I.e. if we see: -c grep.extendedRegexp=false -c grep.patternType=default -c extendedRegexp=true We need to select ERE, since grep.patternType=default unselects that variable, which normally has higher precedence, but we also need to select BRE in cases of: -c grep.extendedRegexp=true \ -c grep.extendedRegexp=false Which would not be the case for this, which select ERE: -c grep.patternType=extended \ -c grep.extendedRegexp=false Therefore we cannot do this on-the-fly in grep_config without also introducing tracking variables for not only the pattern type, but what the source of that pattern type was. So we need to decide on the pattern after our config was fully parsed. Let's do that by deferring the decision on the pattern type until it's time to compile it in compile_regexp(). By that time we've not only parsed the config, but also handled the command-line options. Those will set "opt.pattern_type_option" (*not* "opt.extended_regexp_option"!). At that point all we need to do is see if "grep.patternType" was UNSPECIFIED in the end (including an explicit "=default"), if so we'll use the "grep.extendedRegexp" configuration, if any. See my 07a3d411739 (grep: remove regflags from the public grep_opt API, 2017-06-29) for addition of the two comments being removed here, i.e. the complexity noted in that commit is now going away. 1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"Ævar Arnfjörð Bjarmason
Change code in compile_regexp() to check the cheaper boolean "!opt->pcre2" condition before the "memchr()" search. This doesn't noticeably optimize anything, but makes the code more obvious and conventional. The line wrapping being added here also makes a subsequent commit smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep API: call grep_config() after grep_init()Ævar Arnfjörð Bjarmason
The grep_init() function used the odd pattern of initializing the passed-in "struct grep_opt" with a statically defined "grep_defaults" struct, which would be modified in-place when we invoked grep_config(). So we effectively (b) initialized config, (a) then defaults, (c) followed by user options. Usually those are ordered as "a", "b" and "c" instead. As the comments being removed here show the previous behavior needed to be carefully explained as we'd potentially share the populated configuration among different instances of grep_init(). In practice we didn't do that, but now that it can't be a concern anymore let's remove those comments. This does not change the behavior of any of the configuration variables or options. That would have been the case if we didn't move around the grep_config() call in "builtin/log.c". But now that we call "grep_config" after "git_log_config" and "git_format_config" we'll need to pass in the already initialized "struct grep_opt *". See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and 7687a0541e0 (grep: move the configuration parsing logic to grep.[ch], 2012-10-09) for the commits that added the comments. The memcpy() pattern here will be optimized away and follows the convention of other *_init() functions. See 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT macro, 2021-07-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15built-ins: trust the "prefix" from run_builtin()Ævar Arnfjörð Bjarmason
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the "prefix" passed from "run_builtin()". The "prefix" we get from setup.c is either going to be NULL or a string of length >0, never "". So we can drop the "prefix && *prefix" checks added for "builtin/grep.c" in 0d042fecf2f (git-grep: show pathnames relative to the current directory, 2006-08-11), and for "builtin/ls-tree.c" in a69dd585fca (ls-tree: chomp leading directories when run from a subdirectory, 2005-12-23). As seen in code in revision.c that was added in cd676a51367 (diff --relative: output paths as relative to the current subdirectory, 2008-02-12) we already have existing code that does away with this assertion. This makes it easier to reason about a subsequent change to the "prefix_length" code in grep.c in a subsequent commit, and since we're going to the trouble of doing that let's leave behind an assert() to promise this to any future callers. For "builtin/grep.c" it would be painful to pass the "prefix" down the callchain of: cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid -> grep_source_name So for the code that needs it in grep_source_name() let's add a "grep_prefix" variable similar to the existing "ls_tree_prefix". While at it let's move the code in cmd_ls_tree() around so that we assign to the "ls_tree_prefix" right after declaring the variables, and stop assigning to "prefix". We only subsequently used that variable later in the function after clobbering it. Let's just use our own "grep_prefix" instead. Let's also add an assert() in git.c, so that we'll make this promise about the "prefix" to any current and future callers, as well as to any readers of the code. Code history: * The strlen() in "grep.c" hasn't been used since 493b7a08d80 (grep: accept relative paths outside current working directory, 2009-09-05). When that code was added in 0d042fecf2f (git-grep: show pathnames relative to the current directory, 2006-08-11) we used the length. But since 493b7a08d80 we haven't used it for anything except a boolean check that we could have done on the "prefix" member itself. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05Merge branch 'rs/grep-expr-cleanup'Junio C Hamano
Code clean-up. * rs/grep-expr-cleanup: grep: use grep_and_expr() in compile_pattern_and() grep: extract grep_binexp() from grep_or_expr() grep: use grep_not_expr() in compile_pattern_not() grep: use grep_or_expr() in compile_pattern_or()
2022-01-10Merge branch 'lh/use-gnu-color-in-grep'Junio C Hamano
The color palette used by "git grep" has been updated to match that of GNU grep. * lh/use-gnu-color-in-grep: grep: align default colors with GNU grep ones
2022-01-06grep: use grep_and_expr() in compile_pattern_and()Taylor Blau
In a similar spirit as a previous commit, use the new `grep_and_expr()` to construct the AND node in `compile_pattern_and()`. Unlike the aforementioned previous commit, this is not about code duplication, since this is the only spot in grep.c where an AND node is constructed. Rather, this is about visual consistency with the other `compile_pattern_xyz()` functions. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-06grep: extract grep_binexp() from grep_or_expr()Taylor Blau
When constructing an OR node, the grep.c code uses `grep_or_expr()` to make a node, assign its kind, and set its left and right children. The same is not done for AND nodes. Prepare to introduce a new `grep_and_expr()` function which will share code with the existing implementation of `grep_or_expr()` by introducing a new function which compiles either kind of binary expression, and reimplement `grep_or_expr()` in terms of it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>