From c4c9089584d0ed04978e8d0945b2ba2985e67bd3 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 18 Nov 2025 04:12:18 -0500 Subject: cache-tree: avoid strtol() on non-string buffer A cache-tree extension entry in the index looks like this: NUL SPACE NEWLINE where the "_nr" items are human-readable base-10 ASCII. We parse them with strtol(), even though we do not have a NUL-terminated string (we'd generally have an mmap() of the on-disk index file). For a well-formed entry, this is not a problem; strtol() will stop when it sees the newline. But there are two problems: 1. A corrupted entry could omit the newline, causing us to read further. You'd mostly get stopped by seeing non-digits in the oid field (and if it is likewise truncated, there will still be 20 or more bytes of the index checksum). So it's possible, though unlikely, to read off the end of the mmap'd buffer. Of course a malicious index file can fake the oid and the index checksum to all (ASCII) 0's. This is further complicated by the fact that mmap'd buffers tend to be zero-padded up to the page boundary. So to run off the end, the index size also has to be a multiple of the page size. This is also unlikely, though you can construct a malicious index file that matches this. The security implications aren't too interesting. The index file is a local file anyway (so you can't attack somebody by cloning, but only if you convince them to operate in a .git directory you made, at which point attacking .git/config is much easier). And it's just a read overflow via strtol(), which is unlikely to buy you much beyond a crash. 2. ASan has a strict_string_checks option, which tells it to make sure that options to string functions (like strtol) have some eventual NUL, without regard to what the function would actually do (like stopping at a newline here). This option sometimes has false positives, but it can point to sketchy areas (like this one) where the input we use doesn't exhibit a problem, but different input _could_ cause us to misbehave. Let's fix it by just parsing the values ourselves with a helper function that is careful not to go past the end of the buffer. There are a few behavior changes here that should not matter: - We do not consider overflow, as strtol() would. But nor did the original code. However, we don't trust the value we get from the on-disk file, and if it says to read 2^30 entries, we would notice that we do not have that many and bail before reading off the end of the buffer. - Our helper does not skip past extra leading whitespace as strtol() would, but according to gitformat-index(5) there should not be any. - The original quit parsing at a newline or a NUL byte, but now we insist on a newline (which is what the documentation says, and what Git has always produced). Since we are providing our own helper function, we can tweak the interface a bit to make our lives easier. The original code does not use strtol's "end" pointer to find the end of the parsed data, but rather uses a separate loop to advance our "buf" pointer to the trailing newline. We can instead provide a helper that advances "buf" as it parses, letting us read strictly left-to-right through the buffer. I didn't add a new test here. It's surprisingly difficult to construct an index of exactly the right size due to the way we pad entries. But it is easy to trigger the problem in existing tests when using ASan's strict string checking, coupled with a recent change to use NO_MMAP with ASan builds. So: make SANITIZE=address cd t ASAN_OPTIONS=strict_string_checks=1 ./t0090-cache-tree.sh triggers it reliably. Technically it is not deterministic because there is ~8% chance (it's 1-(255/256)^20, or ^32 for sha256) that the trailing checksum hash has a NUL byte in it. But we compute enough cache-trees in the course of that script that we are very likely to hit the problem in one of them. We can look at making strict_string_checks the default for ASan builds, but there are some other cases we'd want to fix first. Reported-by: correctmost Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- cache-tree.c | 50 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 37 insertions(+), 13 deletions(-) (limited to 'cache-tree.c') diff --git a/cache-tree.c b/cache-tree.c index fa3858e282..2309911dfa 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -548,12 +548,41 @@ void cache_tree_write(struct strbuf *sb, struct cache_tree *root) trace2_region_leave("cache_tree", "write", the_repository); } +static int parse_int(const char **ptr, unsigned long *len_p, int *out) +{ + const char *s = *ptr; + unsigned long len = *len_p; + int ret = 0; + int sign = 1; + + while (len && *s == '-') { + sign *= -1; + s++; + len--; + } + + while (len) { + if (!isdigit(*s)) + break; + ret *= 10; + ret += *s - '0'; + s++; + len--; + } + + if (s == *ptr) + return -1; + + *ptr = s; + *len_p = len; + *out = sign * ret; + return 0; +} + static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) { const char *buf = *buffer; unsigned long size = *size_p; - const char *cp; - char *ep; struct cache_tree *it; int i, subtree_nr; const unsigned rawsz = the_hash_algo->rawsz; @@ -569,19 +598,14 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) buf++; size--; it = cache_tree(); - cp = buf; - it->entry_count = strtol(cp, &ep, 10); - if (cp == ep) + if (parse_int(&buf, &size, &it->entry_count) < 0) goto free_return; - cp = ep; - subtree_nr = strtol(cp, &ep, 10); - if (cp == ep) + if (!size || *buf != ' ') goto free_return; - while (size && *buf && *buf != '\n') { - size--; - buf++; - } - if (!size) + buf++; size--; + if (parse_int(&buf, &size, &subtree_nr) < 0) + goto free_return; + if (!size || *buf != '\n') goto free_return; buf++; size--; if (0 <= it->entry_count) { -- cgit v1.2.3