<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/pack-bitmap.c, branch v2.26.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2020-03-05T18:43:03Z</updated>
<entry>
<title>Merge branch 'jk/nth-packed-object-id'</title>
<updated>2020-03-05T18:43:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-03-05T18:43:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e8e71848ea866d7dc34eacffc20b9c3826ae29a1'/>
<id>urn:sha1:e8e71848ea866d7dc34eacffc20b9c3826ae29a1</id>
<content type='text'>
Code cleanup to use "struct object_id" more by replacing use of
"char *sha1"

* jk/nth-packed-object-id:
  packfile: drop nth_packed_object_sha1()
  packed_object_info(): use object_id internally for delta base
  packed_object_info(): use object_id for returning delta base
  pack-check: push oid lookup into loop
  pack-check: convert "internal error" die to a BUG()
  pack-bitmap: use object_id when loading on-disk bitmaps
  pack-objects: use object_id struct in pack-reuse code
  pack-objects: convert oe_set_delta_ext() to use object_id
  pack-objects: read delta base oid into object_id struct
  nth_packed_object_oid(): use customary integer return
</content>
</entry>
<entry>
<title>Merge branch 'jk/object-filter-with-bitmap'</title>
<updated>2020-03-02T23:07:18Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-03-02T23:07:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=0df82d99dae85dbd4f667e95020a146ea0167975'/>
<id>urn:sha1:0df82d99dae85dbd4f667e95020a146ea0167975</id>
<content type='text'>
The object reachability bitmap machinery and the partial cloning
machinery were not prepared to work well together, because some
object-filtering criteria that partial clones use inherently rely
on object traversal, but the bitmap machinery is an optimization
to bypass that object traversal.  There however are some cases
where they can work together, and they were taught about them.

* jk/object-filter-with-bitmap:
  rev-list --count: comment on the use of count_right++
  pack-objects: support filters with bitmaps
  pack-bitmap: implement BLOB_LIMIT filtering
  pack-bitmap: implement BLOB_NONE filtering
  bitmap: add bitmap_unset() function
  rev-list: use bitmap filters for traversal
  pack-bitmap: basic noop bitmap filter infrastructure
  rev-list: allow commit-only bitmap traversals
  t5310: factor out bitmap traversal comparison
  rev-list: allow bitmaps when counting objects
  rev-list: make --count work with --objects
  rev-list: factor out bitmap-optimized routines
  pack-bitmap: refuse to do a bitmap traversal with pathspecs
  rev-list: fallback to non-bitmap traversal when filtering
  pack-bitmap: fix leak of haves/wants object lists
  pack-bitmap: factor out type iterator initialization
</content>
</entry>
<entry>
<title>pack-bitmap: use object_id when loading on-disk bitmaps</title>
<updated>2020-02-24T20:55:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:32:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=500e4f236606684467b0b34b86e319dfa40747c4'/>
<id>urn:sha1:500e4f236606684467b0b34b86e319dfa40747c4</id>
<content type='text'>
A pack bitmap file contains the index position of the commit for each
bitmap, which we then translate into an object id via
nth_packed_object_sha1(). In preparation for that function going away,
we can switch to the more type-safe nth_packed_object_id().

Note that even though the result ends up in an object_id this does incur
an extra copy of the hash (into our temporary object_id, and then into
the final malloc'd stored_bitmap struct). This shouldn't make any
measurable difference. If it did, we could avoid this copy _and_ the
copy of the rest of the items by allocating the stored_bitmap struct
beforehand and reading directly into it from the bitmap file. Or better
still, if this is a bottleneck, we could introduce an on-disk index to
the bitmap file so we don't have to read every single entry to use just
one of them. So it's not worth worrying about micro-optimizing out this
one hash copy.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>nth_packed_object_oid(): use customary integer return</title>
<updated>2020-02-24T20:55:42Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:27:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=0763671b8e0b3ef873df13c741a911b809e6813d'/>
<id>urn:sha1:0763671b8e0b3ef873df13c741a911b809e6813d</id>
<content type='text'>
Our nth_packed_object_sha1() function returns NULL for error. So when we
wrapped it with nth_packed_object_oid(), we kept the same semantics. But
it's a bit funny, because the caller actually passes in an out
parameter, and the pointer we return is just that same struct they
passed to us (or NULL).

It's not too terrible, but it does make the interface a little
non-idiomatic. Let's switch to our usual "0 for success, negative for
error" return value. Most callers either don't check it, or are
trivially converted. The one that requires the biggest change is
actually improved, as we can ditch an extra aliased pointer variable.

Since we are changing the interface in a subtle way that the compiler
wouldn't catch, let's also change the name to catch any topics in
flight. We can drop the 'o' and make it nth_packed_object_id(). That's
slightly shorter, but also less redundant since the 'o' stands for
"object" already.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/packfile-reuse-cleanup'</title>
<updated>2020-02-14T20:54:19Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=a14aebeac330e6d58f9628a02521ea780daf0a5b'/>
<id>urn:sha1:a14aebeac330e6d58f9628a02521ea780daf0a5b</id>
<content type='text'>
The way "git pack-objects" reuses objects stored in existing pack
to generate its result has been improved.

* jk/packfile-reuse-cleanup:
  pack-bitmap: don't rely on bitmap_git-&gt;reuse_objects
  pack-objects: add checks for duplicate objects
  pack-objects: improve partial packfile reuse
  builtin/pack-objects: introduce obj_is_packed()
  pack-objects: introduce pack.allowPackReuse
  csum-file: introduce hashfile_total()
  pack-bitmap: simplify bitmap_has_oid_in_uninteresting()
  pack-bitmap: uninteresting oid can be outside bitmapped packfile
  pack-bitmap: introduce bitmap_walk_contains()
  ewah/bitmap: introduce bitmap_word_alloc()
  packfile: expose get_delta_base()
  builtin/pack-objects: report reused packfile objects
</content>
</entry>
<entry>
<title>pack-bitmap: implement BLOB_LIMIT filtering</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=84243da1298890bd7370df66b754c2b252a08346'/>
<id>urn:sha1:84243da1298890bd7370df66b754c2b252a08346</id>
<content type='text'>
Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):

  Test                                         HEAD~2            HEAD
  ------------------------------------------------------------------------------------
  5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
  5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%

The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: implement BLOB_NONE filtering</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=4f3bd5606a02260274555f41fd7d6368f2bea1d8'/>
<id>urn:sha1:4f3bd5606a02260274555f41fd7d6368f2bea1d8</id>
<content type='text'>
We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.

Note two subtleties in the implementation (which I also called out in
comments):

  - we have to include any blobs that were specifically asked for (and
    not reached through graph traversal) to match the non-bitmap version

  - we have to handle in-pack and "ext_index" objects separately.
    Arguably prepare_bitmap_walk() could be adding these ext_index
    objects to the type bitmaps. But it doesn't for now, so let's match
    the rest of the bitmap code here (it probably wouldn't be an
    efficiency improvement to do so since the cost of extending those
    bitmaps is about the same as our loop here, but it might make the
    code a bit simpler).

Here are perf results for the new test on git.git:

  Test                                    HEAD^             HEAD
  --------------------------------------------------------------------------------
  5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: basic noop bitmap filter infrastructure</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=6663ae0a0818aba5d4de289b1a37e1961ad6c367'/>
<id>urn:sha1:6663ae0a0818aba5d4de289b1a37e1961ad6c367</id>
<content type='text'>
Currently you can't use object filters with bitmaps, but we plan to
support at least some filters with bitmaps. Let's introduce some
infrastructure that will help us do that:

  - prepare_bitmap_walk() now accepts a list_objects_filter_options
    parameter (which can be NULL for no filtering; all the current
    callers pass this)

  - we'll bail early if the filter is incompatible with bitmaps (just as
    we would if there were no bitmaps at all). Currently all filters are
    incompatible.

  - we'll filter the resulting bitmap; since there are no supported
    filters yet, this is always a noop.

There should be no behavior change yet, but we'll support some actual
filters in a future patch.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>rev-list: allow commit-only bitmap traversals</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=4eb707ebd681eb85306071db33ed70186d1642ac'/>
<id>urn:sha1:4eb707ebd681eb85306071db33ed70186d1642ac</id>
<content type='text'>
Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:

  git rev-list --objects --use-bitmap-index --all

But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.

For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).

Here are numbers for linux.git:

  Test                         HEAD^               HEAD
  ------------------------------------------------------------------------
  5310.7: rev-list (commits)   8.29(8.10+0.19)       1.76(1.72+0.04) -78.8%
  5310.8: rev-list (objects)   8.06(7.94+0.12)       8.14(7.94+0.13) +1.0%

That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:

  Test                         HEAD^               HEAD
  ------------------------------------------------------------------------
  5310.7: rev-list (commits)   0.70(0.58+0.12)     0.51(0.46+0.04) -27.1%
  5310.8: rev-list (objects)   6.20(6.09+0.10)     6.27(6.16+0.11) +1.1%

Still an improvement, but a lot less impressive.

We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-bitmap: refuse to do a bitmap traversal with pathspecs</title>
<updated>2020-02-14T18:46:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-14T18:22:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=d90fe06ea7dd15bdbd555ad2f4bfdd069032b697'/>
<id>urn:sha1:d90fe06ea7dd15bdbd555ad2f4bfdd069032b697</id>
<content type='text'>
rev-list has refused to use bitmaps with pathspec limiting since
c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits,
2015-07-01). But this is true not just for rev-list, but for anyone who
calls prepare_bitmap_walk(); the code isn't equipped to handle this
case.  We never noticed because the only other callers would never pass
a pathspec limiter.

But let's push the check down into prepare_bitmap_walk() anyway. That's
a more logical place for it to live, as callers shouldn't need to know
the details (and must be prepared to fall back to a regular traversal
anyway, since there might not be bitmaps in the repository).

It would also prepare us for a day where this case _is_ handled, but
that's pretty unlikely. E.g., we could use bitmaps to generate the set
of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks
the commits. But you'd probably do better still to make use of the newer
commit-graph feature to make walking the commits very cheap.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
