<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/object-store.h, branch v2.26.0-rc2</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v2.26.0-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2020-02-24T20:55:53Z</updated>
<entry>
<title>packed_object_info(): use object_id for returning delta base</title>
<updated>2020-02-24T20:55:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:36:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b99b6bcc57faf5c989fc18c3b8d4d92df3407cec'/>
<id>urn:sha1:b99b6bcc57faf5c989fc18c3b8d4d92df3407cec</id>
<content type='text'>
If a caller sets the object_info.delta_base_sha1 to a non-NULL pointer,
we'll write the oid of the object's delta base to it. But we can
increase our type safety by switching this to a real object_id struct.
All of our callers are just pointing into the hash member of an
object_id anyway, so there's no inconvenience.

Note that we do still keep it as a pointer-to-struct, because the NULL
sentinel value tells us whether the caller is even interested in the
information.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'mt/use-passed-repo-more-in-funcs'</title>
<updated>2020-02-14T20:54:22Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=78e67cda42a440f216fc4cbfd8a9e190c5eece5a'/>
<id>urn:sha1:78e67cda42a440f216fc4cbfd8a9e190c5eece5a</id>
<content type='text'>
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).

* mt/use-passed-repo-more-in-funcs:
  sha1-file: allow check_object_signature() to handle any repo
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  streaming: allow open_istream() to handle any repo
  pack-check: use given repo's hash_algo at verify_packfile()
  cache-tree: use given repo's hash_algo at verify_one()
  diff: make diff_populate_filespec() honor its repo argument
</content>
</entry>
<entry>
<title>Merge branch 'mt/threaded-grep-in-object-store'</title>
<updated>2020-02-14T20:54:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=56ceb64eb0377acb2884226d4fb7d7c3f4032354'/>
<id>urn:sha1:56ceb64eb0377acb2884226d4fb7d7c3f4032354</id>
<content type='text'>
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe.  This limitation is getting lifted.

* mt/threaded-grep-in-object-store:
  grep: use no. of cores as the default no. of threads
  grep: move driver pre-load out of critical section
  grep: re-enable threads in non-worktree case
  grep: protect packed_git [re-]initialization
  grep: allow submodule functions to run in parallel
  submodule-config: add skip_if_read option to repo_read_gitmodules()
  grep: replace grep_read_mutex by internal obj read lock
  object-store: allow threaded access to object reading
  replace-object: make replace operations thread-safe
  grep: fix racy calls in grep_objects()
  grep: fix race conditions at grep_submodule()
  grep: fix race conditions on userdiff calls
</content>
</entry>
<entry>
<title>Merge branch 'jn/pretend-object-doc'</title>
<updated>2020-02-12T20:41:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-12T20:41:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b486d2ee811d7471e7a31cd315eddca1be79fc19'/>
<id>urn:sha1:b486d2ee811d7471e7a31cd315eddca1be79fc19</id>
<content type='text'>
Warn programmers about pretend_object_file() that allows the code
to tentatively use in-core objects.

* jn/pretend-object-doc:
  sha1-file: document how to use pretend_object_file
</content>
</entry>
<entry>
<title>sha1-file: pass git_hash_algo to hash_object_file()</title>
<updated>2020-01-31T18:45:39Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-30T20:32:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2dcde20e1c55fc2e3f9e9e6d48e93c39ec5661d2'/>
<id>urn:sha1:2dcde20e1c55fc2e3f9e9e6d48e93c39ec5661d2</id>
<content type='text'>
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jt/sha1-file-remove-oi-skip-cached'</title>
<updated>2020-01-22T23:07:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-01-22T23:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e26bd14c8d7c7983d741cd59c076d33306b7050e'/>
<id>urn:sha1:e26bd14c8d7c7983d741cd59c076d33306b7050e</id>
<content type='text'>
has_object_file() said "no" given an object registered to the
system via pretend_object_file(), making it inconsistent with
read_object_file(), causing lazy fetch to attempt fetching an
empty tree from promisor remotes.

* jt/sha1-file-remove-oi-skip-cached:
  sha1-file: remove OBJECT_INFO_SKIP_CACHED
</content>
</entry>
<entry>
<title>object-store: allow threaded access to object reading</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=31877c9aec21e0824fd4fcf415069cf8dfae4b72'/>
<id>urn:sha1:31877c9aec21e0824fd4fcf415069cf8dfae4b72</id>
<content type='text'>
Allow object reading to be performed by multiple threads protecting it
with an internal lock, the obj_read_mutex. The lock usage can be toggled
with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
functions which can be safely called in parallel are:
read_object_file_extended(), repo_read_object_file(),
read_object_file(), read_object_with_reference(), read_object(),
oid_object_info() and oid_object_info_extended(). It's also possible
to use obj_read_lock() and obj_read_unlock() to protect other sections
that cannot execute in parallel with object reading.

Probably there are many spots in the functions listed above that could
be executed unlocked (and thus, in parallel). But, for now, we are most
interested in allowing parallel access to zlib inflation. This is one of
the sections where object reading spends most of the time in (e.g. up to
one-third of git-grep's execution time in the chromium repo corresponds
to inflation) and it's already thread-safe. So, to take advantage of
that, the obj_read_mutex is released when calling git_inflate() and
re-acquired right after, for every calling spot in
oid_object_info_extended()'s call chain. We may refine this lock to also
exploit other possible parallel spots in the future, but for now,
threaded zlib inflation should already give great speedups for threaded
object reading callers.

Note that add_delta_base_cache() was also modified to skip adding
already present entries to the cache. This wasn't possible before, but
it would be now, with the parallel inflation. Take for example the
following situation, where two threads - A and B - are executing the
code at unpack_entry():

1. Thread A is performing the decompression of a base O (which is not
   yet in the cache) at PHASE II. Thread B is simultaneously trying to
   unpack O, but just starting at PHASE I.
2. Since O is not yet in the cache, B will go to PHASE II to also
   perform the decompression.
3. When they finish decompressing, one of them will get the object
   reading mutex and go to PHASE III while the other waits for the
   mutex. Let’s say A got the mutex first.
4. Thread A will add O to the cache, go throughout the rest of PHASE III
   and return.
5. Thread B gets the mutex, also add O to the cache (if the check wasn't
   there) and returns.

Finally, it is also important to highlight that the object reading lock
can only ensure thread-safety in the mentioned functions thanks to two
complementary mechanisms: the use of 'struct raw_object_store's
replace_mutex, which guards sections in the object reading machinery
that would otherwise be thread-unsafe; and the 'struct pack_window's
inuse_cnt, which protects window reading operations (such as the one
performed during the inflation of a packed object), allowing them to
execute without the acquisition of the obj_read_mutex.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>replace-object: make replace operations thread-safe</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=b1fc9da1c84a94ef03eb07df361f3ec43006b39f'/>
<id>urn:sha1:b1fc9da1c84a94ef03eb07df361f3ec43006b39f</id>
<content type='text'>
replace-object functions are very close to being thread-safe: the only
current racy section is the lazy initialization at
prepare_replace_object(). The following patches will protect some object
reading operations to be called threaded, but before that, replace
functions must be protected. To do so, add a mutex to struct
raw_object_store and acquire it before lazy initializing the
replace_map. This won't cause any noticeable performance drop as the
mutex will no longer be used after the replace_map is initialized.

Later, when the replace functions are called in parallel, thread
debuggers might point our use of the added replace_map_initialized flag
as a data race. However, as this boolean variable is initialized as
false and it's only updated once, there's no real harm. It's perfectly
fine if the value is updated right after a thread read it in
replace-map.h:lookup_replace_object() (there'll only be a performance
penalty for the affected threads at that moment). We could cease the
debugger warning protecting the variable reading at the said function.
However, this would negatively affect performance for all threads
calling it, at any time, so it's not really worthy since the warning
doesn't represent a real problem. Instead, to make sure we don't get
false positives (at ThreadSanitizer, at least) an entry for the
respective function is added to .tsan-suppressions.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sha1-file: document how to use pretend_object_file</title>
<updated>2020-01-06T16:44:24Z</updated>
<author>
<name>Jonathan Nieder</name>
<email>jrnieder@gmail.com</email>
</author>
<published>2020-01-04T00:13:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=60440d72db4c9f2fc05b841813e72159c4f08928'/>
<id>urn:sha1:60440d72db4c9f2fc05b841813e72159c4f08928</id>
<content type='text'>
Like in-memory alternates, pretend_object_file contains a trap for the
unwary: careless callers can use it to create references to an object
that does not exist in the on-disk object store.

Add a comment documenting how to use the function without risking such
problems.

The only current caller is blame, which uses pretend_object_file to
create an in-memory commit representing the working tree state.
Noticed during a discussion of how to safely use this function in
operations like "git merge" which, unlike blame, are not read-only.

Inspired-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Jonathan Nieder &lt;jrnieder@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>sha1-file: remove OBJECT_INFO_SKIP_CACHED</title>
<updated>2020-01-02T20:53:30Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-01-02T20:16:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9c8a294a1ae1335511475db9c0eb8841c0ec9738'/>
<id>urn:sha1:9c8a294a1ae1335511475db9c0eb8841c0ec9738</id>
<content type='text'>
In a partial clone, if a user provides the hash of the empty tree ("git
mktree &lt;/dev/null" - for SHA-1, this is 4b825d...) to a command which
requires that that object be parsed, for example:

  git diff-tree 4b825d &lt;a non-empty tree&gt;

then Git will lazily fetch the empty tree, unnecessarily, because
parsing of that object invokes repo_has_object_file(), which does not
special-case the empty tree.

Instead, teach repo_has_object_file() to consult find_cached_object()
(which handles the empty tree), thus bringing it in line with the rest
of the object-store-accessing functions. A cost is that
repo_has_object_file() will now need to oideq upon each invocation, but
that is trivial compared to the filesystem lookup or the pack index
search required anyway. (And if find_cached_object() needs to do more
because of previous invocations to pretend_object_file(), all the more
reason to be consistent in whether we present cached objects.)

As a historical note, the function now known as repo_read_object_file()
was taught the empty tree in 346245a1bb ("hard-code the empty tree
object", 2008-02-13), and the function now known as oid_object_info()
was taught the empty tree in c4d9986f5f ("sha1_object_info: examine
cached_object store too", 2011-02-07). repo_has_object_file() was never
updated, perhaps due to oversight. The flag OBJECT_INFO_SKIP_CACHED,
introduced later in dfdd4afcf9 ("sha1_file: teach
sha1_object_info_extended more flags", 2017-06-26) and used in
e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26),
was introduced to preserve this difference in empty-tree handling, but
now it can be removed.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
