<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/builtin-pack-objects.c, branch v1.6.4.5</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.6.4.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.6.4.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2009-08-06T03:14:54Z</updated>
<entry>
<title>don't let the delta cache grow unbounded in 'git repack'</title>
<updated>2009-08-06T03:14:54Z</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2009-08-05T20:55:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=5749b0b2f935dc418c98ba8e7c52c3291451bc4a'/>
<id>urn:sha1:5749b0b2f935dc418c98ba8e7c52c3291451bc4a</id>
<content type='text'>
I have 4GB of RAM on my system which should, in theory, be quite enough
to repack a 600 MB repository.  However the unbounded delta cache size
always pushes it into swap, at which point everything virtually comes to
a halt.  So unbounded caches are never a good idea.

A default of 256MB should be a good compromize between memory usage and
speed where medium sized repositories are still likely to fit in the
cache with a reasonable memory usage, and larger repositories are going
to take quite some time to repack already anyway.

While at it, clarify the associated config variable documentation
entries a bit.

Signed-off-by: Nicolas Pitre &lt;nico@cam.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/maint-graft-unhide-true-parents'</title>
<updated>2009-07-25T07:45:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2009-07-25T07:45:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=130b04ab37d74e574d525df7948b963b13c6bdbf'/>
<id>urn:sha1:130b04ab37d74e574d525df7948b963b13c6bdbf</id>
<content type='text'>
* js/maint-graft-unhide-true-parents:
  git repack: keep commits hidden by a graft
  Add a test showing that 'git repack' throws away grafted-away parents

Conflicts:
	git-repack.sh
</content>
</entry>
<entry>
<title>git repack: keep commits hidden by a graft</title>
<updated>2009-07-24T16:10:16Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2009-07-23T15:33:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=7f3140cd23f126e578ccaaea8c2cebe36824a7ac'/>
<id>urn:sha1:7f3140cd23f126e578ccaaea8c2cebe36824a7ac</id>
<content type='text'>
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.

So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.

As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tr/die_errno'</title>
<updated>2009-07-06T16:39:46Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2009-07-06T16:39:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=dd787c19c4f011cc3adb422ef856f2c58d989640'/>
<id>urn:sha1:dd787c19c4f011cc3adb422ef856f2c58d989640</id>
<content type='text'>
* tr/die_errno:
  Use die_errno() instead of die() when checking syscalls
  Convert existing die(..., strerror(errno)) to die_errno()
  die_errno(): double % in strerror() output just in case
  Introduce die_errno() that appends strerror(errno) to die()
</content>
</entry>
<entry>
<title>Convert existing die(..., strerror(errno)) to die_errno()</title>
<updated>2009-06-27T18:14:53Z</updated>
<author>
<name>Thomas Rast</name>
<email>trast@student.ethz.ch</email>
</author>
<published>2009-06-27T15:58:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=d824cbba02a4061400a0e382f9bd241fbbff34f0'/>
<id>urn:sha1:d824cbba02a4061400a0e382f9bd241fbbff34f0</id>
<content type='text'>
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast &lt;trast@student.ethz.ch&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix big left-shifts of unsigned char</title>
<updated>2009-06-18T16:22:46Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-06-18T00:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=48fb7deb5bbd87933e7d314b73d7c1b52667f80f'/>
<id>urn:sha1:48fb7deb5bbd87933e7d314b73d7c1b52667f80f</id>
<content type='text'>
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).

This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.

One example of this would be something like

	unsigned long size;
	unsigned char c;

	size += c &lt;&lt; 24;

where despite all the variables being unsigned, 'c &lt;&lt; 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.

Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.

I may have missed some, but this is the result of looking at

	git grep '[^0-9 	][ 	]*&lt;&lt;[ 	][a-z]' -- '*.c' '*.h'
	git grep '&lt;&lt;[   ]*24'

which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).

I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).

In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "&amp; 0x7fUL" in one place (getting the sign extension
right), and "&amp; ~0x80" and "&amp; 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "&amp; 0x7f".

I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Fix typos / spelling in comments</title>
<updated>2009-04-23T02:02:12Z</updated>
<author>
<name>Mike Ralphson</name>
<email>mike@abacus.co.uk</email>
</author>
<published>2009-04-17T18:13:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3ea3c215c02dc4a4e7d0881c25b2223540960797'/>
<id>urn:sha1:3ea3c215c02dc4a4e7d0881c25b2223540960797</id>
<content type='text'>
Signed-off-by: Mike Ralphson &lt;mike@abacus.co.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'lt/pack-object-memuse'</title>
<updated>2009-04-18T21:46:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2009-04-18T21:46:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=9824a388e53ba0951e38f246038fa0ef6fda3397'/>
<id>urn:sha1:9824a388e53ba0951e38f246038fa0ef6fda3397</id>
<content type='text'>
* lt/pack-object-memuse:
  show_object(): push path_name() call further down
  process_{tree,blob}: show objects without buffering

Conflicts:
	builtin-pack-objects.c
	builtin-rev-list.c
	list-objects.c
	list-objects.h
	upload-pack.c
</content>
</entry>
<entry>
<title>show_object(): push path_name() call further down</title>
<updated>2009-04-13T00:28:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-11T01:15:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=cf2ab916afa4231f7e9db31796e7c0f712ff6ad1'/>
<id>urn:sha1:cf2ab916afa4231f7e9db31796e7c0f712ff6ad1</id>
<content type='text'>
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow

 - more clarity into who "owns" the name (ie now when we free the name in
   the show_object callback, it's because we generated it ourselves by
   calling path_name())

 - not calling path_name() at all, either because we don't care about the
   name in the first place, or because we are actually happy walking the
   linked list of "struct name_path *" and the last component.

Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.

Why? We use that name for two things:
 - add_preferred_base_object(), which actually _wants_ to traverse the
   path, and now does it by looking for '/' characters!
 - for 'name_hash()', which only cares about the last 16 characters of a
   name, so again, generating the full name seems to be just unnecessary
   work.

Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.

This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>process_{tree,blob}: show objects without buffering</title>
<updated>2009-04-13T00:28:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-11T00:27:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=8d2dfc49b199c7da6faefd7993630f24bd37fee0'/>
<id>urn:sha1:8d2dfc49b199c7da6faefd7993630f24bd37fee0</id>
<content type='text'>
Here's a less trivial thing, and slightly more dubious one.

I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.

Now, there are possible downsides to this:

 - the "buffer using object_array" _can_ in theory result in at least
   better I-cache usage (two tight loops rather than one more spread out
   one). I don't think this is a real issue, but in theory..

 - this _does_ change the order of the objects printed. Instead of doing a
   "process_tree(revs, commit-&gt;tree, &amp;objects, NULL, "");" in the loop
   over the commits (which puts all the root trees _first_ in the object
   list, this patch just adds them to the list of pending objects, and
   then we'll traverse them in that order (and thus show each root tree
   object together with the objects we discover under it)

   I _think_ the new ordering actually makes more sense, but the object
   ordering is actually a subtle thing when it comes to packing
   efficiency, so any change in order is going to have implications for
   packing. Good or bad, I dunno.

 - There may be some reason why we did it that odd way with the object
   array, that I have simply forgotten.

Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.

This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...

Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.

Or maybe there _used_ to be a reason, and no longer is.

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
