<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/list-objects.c, branch v2.7.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.7.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.7.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2015-11-20T13:02:05Z</updated>
<entry>
<title>Convert struct object to object_id</title>
<updated>2015-11-20T13:02:05Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2015-11-10T02:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=f2fd0760f62e79609fef7bfd7ecebb002e8e4ced'/>
<id>urn:sha1:f2fd0760f62e79609fef7bfd7ecebb002e8e4ced</id>
<content type='text'>
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/squelch-missing-link-warning-for-unreachable'</title>
<updated>2015-06-11T16:29:59Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2015-06-11T16:29:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=43262d8d6511212f49d519779505f8a557c8dc84'/>
<id>urn:sha1:43262d8d6511212f49d519779505f8a557c8dc84</id>
<content type='text'>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.

* jk/squelch-missing-link-warning-for-unreachable:
  suppress errors on missing UNINTERESTING links
  silence broken link warnings with revs-&gt;ignore_missing_links
  add quieter versions of parse_{tree,commit}
</content>
</entry>
<entry>
<title>silence broken link warnings with revs-&gt;ignore_missing_links</title>
<updated>2015-06-01T16:29:50Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2015-06-01T09:56:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=daf7d86783b1bd2065881a3f0957f69c79a52fd7'/>
<id>urn:sha1:daf7d86783b1bd2065881a3f0957f69c79a52fd7</id>
<content type='text'>
We set revs-&gt;ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.

However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>rev-list: add an option to mark fewer edges as uninteresting</title>
<updated>2014-12-29T17:57:55Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2014-12-24T23:05:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=1684c1b219e02c91655ad929f752f4f864c72faf'/>
<id>urn:sha1:1684c1b219e02c91655ad929f752f4f864c72faf</id>
<content type='text'>
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting.  This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.

Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>traverse_commit_list: support pending blobs/trees with paths</title>
<updated>2014-10-19T22:06:31Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-10-15T22:43:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=207394908e9465d0169608725aeaa5bb355086e0'/>
<id>urn:sha1:207394908e9465d0169608725aeaa5bb355086e0</id>
<content type='text'>
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).

When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.

However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:

  1. Use the "path" field as the path instead of the "name"
     field in traverse_commit_list. If the path is not set,
     default to "" (which is what we always ended up with in
     the current code, because of prepare_revision_walk).

  2. In prepare_revision_walk, make a complete copy of the
     entry. This makes the path field available to the
     walker (if there is one), solving our problem.
     Leaving the name field intact is now OK, as we do not
     use it as a path due to point (1) above (and we can use
     it to make more meaningful error messages if we want).
     We also make the original "mode" field available to the
     walker, though it does not actually use it.

Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object_array: add a "clear" function</title>
<updated>2014-10-16T17:10:37Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-10-15T22:34:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=46be823124bb6a6ff0e06dc19c327b599ed97c72'/>
<id>urn:sha1:46be823124bb6a6ff0e06dc19c327b599ed97c72</id>
<content type='text'>
There's currently no easy way to free the memory associated
with an object_array (and in most cases, we simply leak the
memory in a rev_info's pending array). Let's provide a
helper to make this easier to handle.

We can make use of it in list-objects.c, which does the same
thing by hand (but fails to free the "name" field of each
entry, potentially leaking memory).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/pack-bitmap'</title>
<updated>2014-04-08T19:00:33Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-04-08T19:00:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=967f8c918465312cc6cc1bcbcfacafcf95152bd8'/>
<id>urn:sha1:967f8c918465312cc6cc1bcbcfacafcf95152bd8</id>
<content type='text'>
* jk/pack-bitmap:
  pack-objects: do not reuse packfiles without --delta-base-offset
  add `ignore_missing_links` mode to revwalk
</content>
</entry>
<entry>
<title>add `ignore_missing_links` mode to revwalk</title>
<updated>2014-04-04T20:31:38Z</updated>
<author>
<name>Vicent Marti</name>
<email>tanoku@gmail.com</email>
</author>
<published>2014-03-28T10:00:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2db1a43f41880bb4aeea9dee8a7d13c5ad76db3f'/>
<id>urn:sha1:2db1a43f41880bb4aeea9dee8a7d13c5ad76db3f</id>
<content type='text'>
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).

In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).

When we have bitmaps, however, the process is quite
different.  The bitmap index for a pack-objects run is
calculated in two separate steps:

First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.

Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.

Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.

When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.

We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.

It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().

Note that there are a few cases we do not need to test:

  1. We do not need to test a missing tree, with the blob
     still present. Without the tree that refers to it, we
     would not know that the blob is relevant to our walk.

  2. We do not need to test a tip commit that is missing.
     Upload-pack omits these for us (and in fact, we
     complain even in the non-bitmap case if it fails to do
     so).

Reported-by: Siddharth Agarwal &lt;sid0@fb.com&gt;
Signed-off-by: Vicent Marti &lt;tanoku@gmail.com&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/mark-edges-uninteresting'</title>
<updated>2014-01-27T18:45:08Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-01-27T18:45:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=a6bec00145da3013e693072122f2fa53076e73cd'/>
<id>urn:sha1:a6bec00145da3013e693072122f2fa53076e73cd</id>
<content type='text'>
Fix performance regression in v1.8.4.x and later.

* jk/mark-edges-uninteresting:
  list-objects: only look at cmdline trees with edge_hint
  t/perf: time rev-list with UNINTERESTING commits
</content>
</entry>
<entry>
<title>list-objects: only look at cmdline trees with edge_hint</title>
<updated>2014-01-21T22:46:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-01-21T02:25:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=200abe7458357c83f3859ce6dbf89ea5d4d09b3d'/>
<id>urn:sha1:200abe7458357c83f3859ce6dbf89ea5d4d09b3d</id>
<content type='text'>
When rev-list is given a command-line like:

  git rev-list --objects $commit --not --all

the most accurate answer is the difference between the set
of objects reachable from $commit and the set reachable from
all of the existing refs. However, we have not historically
provided that answer, because it is very expensive to
calculate. We would have to open every tree of every commit
in the entire history.

Instead, we find the accurate set difference of the
reachable commits, and then mark the trees at the boundaries
as uninteresting. This misses objects which appear in the
trees of both the interesting commits and deep within the
uninteresting history.

Commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting, 2013-08-16) noticed that we miss
those objects during pack-objects, and added code to examine
the trees of all of the "--not" refs given on the
command-line.  Note that this is still not the complete set
difference, because we look only at the tips of the
command-line arguments, not all of their reachable commits.
But it increases the set of boundary objects we consider,
which is especially important for shallow fetches.  So we
are trading extra CPU time for a larger set of boundary
objects, which can improve the resulting pack size for a
--thin pack.

This tradeoff probably makes sense in the context of
pack-objects, where we have set revs-&gt;edge_hint to have the
traversal feed us the set of boundary objects.  For a
regular rev-list, though, it is probably not a good
tradeoff. It is true that it makes our list slightly closer
to a true set difference, but it is a rare case where this
is important. And because we do not have revs-&gt;edge_hint
set, we do nothing useful with the larger set of boundary
objects.

This patch therefore ties the extra tree examination to the
revs-&gt;edge_hint flag; it is the presence of that flag that
makes the tradeoff worthwhile.

Here is output from the p0001-rev-list showing the
improvement in performance:

Test                                             HEAD^             HEAD
-----------------------------------------------------------------------------------------
0001.1: rev-list --all                           0.69(0.65+0.02)   0.69(0.66+0.02) +0.0%
0001.2: rev-list --all --objects                 3.22(3.19+0.03)   3.23(3.20+0.03) +0.3%
0001.4: rev-list $commit --not --all             0.04(0.04+0.00)   0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all   0.27(0.26+0.01)   0.04(0.04+0.00) -85.2%

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
