diff options
| author | Paulo Casaretto <pcasaretto@gmail.com> | 2025-08-29 16:02:54 +0000 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2025-08-29 09:46:07 -0700 |
| commit | 00727249ec8404c68391ec58e9c9f0d8a88d5ca0 (patch) | |
| tree | 466a3bbb6ff3c13198bf303df28df227bc23e98e /builtin/range-diff.c | |
| parent | f814da676ae46aac5be0a98b99373a76dee6cedb (diff) | |
range-diff: add configurable memory limit for cost matrix
When comparing large commit ranges (e.g., 250,000+ commits), range-diff
attempts to allocate an n×n cost matrix that can exhaust available
memory. For example, with 256,784 commits (n = 513,568), the matrix
would require approximately 256GB of memory (513,568² × 4 bytes),
causing either immediate segmentation faults due to integer overflow or
system hangs.
Add a memory limit check in get_correspondences() before allocating the
cost matrix. This check uses the total size in bytes (n² × sizeof(int))
and compares it against a configurable maximum, preventing both
excessive memory usage and integer overflow issues.
The limit is configurable via a new --max-memory option that accepts
human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit
systems and 2GB for 32 bit systems. This allows comparing ranges of
approximately 32,000 (16,000) commits - generous for real-world use cases
while preventing impractical operations.
When the limit is exceeded, range-diff now displays a clear error
message showing both the requested memory size and the maximum allowed,
formatted in human-readable units for better user experience.
Example usage:
git range-diff --max-memory=1G branch1...branch2
git range-diff --max-memory=500M base..topic1 base..topic2
This approach was chosen over alternatives:
- Pre-counting commits: Would require spawning additional git processes
and reading all commits twice
- Limiting by commit count: Less precise than actual memory usage
- Streaming approach: Would require significant refactoring of the
current algorithm
This issue was previously discussed in:
https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@gmail.com/
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'builtin/range-diff.c')
| -rw-r--r-- | builtin/range-diff.c | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/builtin/range-diff.c b/builtin/range-diff.c index a563abff5f..aafcc99b96 100644 --- a/builtin/range-diff.c +++ b/builtin/range-diff.c @@ -6,6 +6,7 @@ #include "parse-options.h" #include "range-diff.h" #include "config.h" +#include "parse.h" static const char * const builtin_range_diff_usage[] = { @@ -15,6 +16,21 @@ N_("git range-diff [<options>] <base> <old-tip> <new-tip>"), NULL }; +static int parse_max_memory(const struct option *opt, const char *arg, int unset) +{ + size_t *max_memory = opt->value; + uintmax_t val; + + if (unset) + return 0; + + if (!git_parse_unsigned(arg, &val, SIZE_MAX)) + return error(_("invalid max-memory value: %s"), arg); + + *max_memory = (size_t)val; + return 0; +} + int cmd_range_diff(int argc, const char **argv, const char *prefix, @@ -25,6 +41,7 @@ int cmd_range_diff(int argc, struct strvec diff_merges_arg = STRVEC_INIT; struct range_diff_options range_diff_opts = { .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT, + .max_memory = RANGE_DIFF_MAX_MEMORY_DEFAULT, .diffopt = &diffopt, .other_arg = &other_arg }; @@ -40,6 +57,10 @@ int cmd_range_diff(int argc, PARSE_OPT_OPTARG), OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, N_("style"), N_("passed to 'git log'"), 0), + OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory, + N_("size"), + N_("maximum memory for cost matrix (default 4G)"), + parse_max_memory), OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL, N_("passed to 'git log'"), PARSE_OPT_NOARG), OPT_BOOL(0, "left-only", &left_only, |
