From 5173099aae25bedf7a87225891d124569cba7076 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 9 Jan 2025 03:33:10 -0500 Subject: tree-diff: clear parent array in path_appendnew() All of the other functions which allocate a combine_diff_path struct zero out the parent array, but this code path does not. There's no bug, since our caller will fill in most of the fields. But leaving the unused fields (like combine_diff_parent.path) uninitialized makes working with the struct more error-prone than it needs to be. Let's just zero the parent field to be consistent with the combine_diff_path_new() allocator. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- tree-diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'tree-diff.c') diff --git a/tree-diff.c b/tree-diff.c index d9237ffd9b..24f7b5912c 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -151,8 +151,6 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ * process(p); * p = pprev; * ; don't forget to free tail->next in the end - * - * p->parent[] remains uninitialized. */ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, int nparent, const struct strbuf *base, const char *path, int pathlen, @@ -187,6 +185,8 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, p->mode = mode; oidcpy(&p->oid, oid ? oid : null_oid()); + memset(p->parent, 0, sizeof(p->parent[0]) * nparent); + return p; } -- cgit v1.2.3 From a8dda1af6ab400d45b7524bc46b64e04d14fc912 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 9 Jan 2025 03:46:49 -0500 Subject: tree-diff: drop path_appendnew() alloc optimization MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When we're diffing trees, we create a list of combine_diff_path structs that represent changed paths. We allocate each struct and add it to the list with path_appendnew(), which we then feed to opt->pathchange(). That function tells us whether the path is of interest or not; if not, then we can throw away the struct we allocated. So there's an optimization to avoid extra allocations: instead of throwing away the new entry, we try to reuse it. If it was large enough to store the next path we care about, we can do so. And if not, we fall back to freeing and re-allocating a new struct. This comes from 72441af7c4 (tree-diff: rework diff_tree() to generate diffs for multiparent cases as well, 2014-04-07), where the goal was to have even the 2-parent diff code use the combine-diff infrastructure, but without taking a performance hit. The implementation causes some complexities in the interface (as we store the allocation length inside the "next" pointer), and prevents us from using the regular combine_diff_path_new() constructor. The complexity is mostly contained inside two functions, but it's worth re-evaluating how much it's helping. That commit claims it helps ~1% on generating two-parent diffs in linux.git. Here are the timings I get on the same command today ("old" is the current tip of master, and "new" has this patch applied): Benchmark 1: ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 Time (mean ± σ): 532.9 ms ± 5.8 ms [User: 472.7 ms, System: 59.6 ms] Range (min … max): 525.9 ms … 543.3 ms 10 runs Benchmark 2: ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11 Time (mean ± σ): 538.3 ms ± 5.7 ms [User: 478.0 ms, System: 59.7 ms] Range (min … max): 528.5 ms … 545.3 ms 10 runs Summary ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 ran 1.01 ± 0.02 times faster than ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11 So we do end up on average 1% faster, but with 2% of noise. I tried to focus more on diff performance by running the commit traversal separately, like: git rev-list v3.10..v3.11 >in and then timing just the diffs: Benchmark 1: ./git.old diff-tree --stdin -r