diff options
author | Elijah Newren <newren@gmail.com> | 2021-12-04 05:35:52 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-12-04 23:39:34 -0800 |
commit | 3656f842789d25d75da41c6c029470052a573b54 (patch) | |
tree | d93cc0a2fe086f91f56ada18d0f158c2450e373d /grep.c | |
parent | abe6bb3905392d5eb6b01fa6e54d7e784e0522aa (diff) |
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'grep.c')
0 files changed, 0 insertions, 0 deletions