<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/git.git/tree.c, branch v1.4.4.4</title>
<subtitle>Git
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.4.4.4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/atom?h=v1.4.4.4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/'/>
<updated>2006-08-23T20:53:10Z</updated>
<entry>
<title>Convert memcpy(a,b,20) to hashcpy(a,b).</title>
<updated>2006-08-23T20:53:10Z</updated>
<author>
<name>Shawn Pearce</name>
<email>spearce@spearce.org</email>
</author>
<published>2006-08-23T06:49:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=e702496e434a160f798447b95b9cea3cd138c140'/>
<id>urn:sha1:e702496e434a160f798447b95b9cea3cd138c140</id>
<content type='text'>
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce &lt;spearce@spearce.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Make track_tree_refs void.</title>
<updated>2006-08-15T01:59:03Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2006-08-14T20:40:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=74b504f67173a9f4dd06f01d109c58ea13f279b4'/>
<id>urn:sha1:74b504f67173a9f4dd06f01d109c58ea13f279b4</id>
<content type='text'>
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Remove TYPE_* constant macros and use object_type enums consistently.</title>
<updated>2006-07-13T06:18:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-07-12T03:45:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=1974632c664c2d573b36a00fa993c1c13dd8a967'/>
<id>urn:sha1:1974632c664c2d573b36a00fa993c1c13dd8a967</id>
<content type='text'>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Add specialized object allocator</title>
<updated>2006-06-20T01:42:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-06-19T17:44:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=855419f764a65e92f1d5dd1b3d50ee987db1d9de'/>
<id>urn:sha1:855419f764a65e92f1d5dd1b3d50ee987db1d9de</id>
<content type='text'>
This creates a simple specialized object allocator for basic
objects.

This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.

It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:

     blobs:   627629 (14710 kB)
     trees:  1119035 (34969 kB)
   commits:   196423  (8440 kB)
      tags:     1336    (46 kB)

and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.

[ Side note: this concludes the series of "save memory in object storage".
  The thing is, there simply isn't much more to be saved on the objects.

  Doing "git-rev-list --all --objects" on the mozilla archive has a final
  total RSS of 131498 pages for me: that's about 513MB. Of that, the
  object overhead is now just 56MB, the rest is going somewhere else (put
  another way: the fact that this patch shaves off 2.5% of the total
  memory overhead, considering that objects are now not much more than 10%
  of the total shows how big the wasted space really was: this makes
  object allocations much more memory- and time-efficient).

  I haven't looked at where the rest is, but I suspect the bulk of it is
  just the pack-file loading. It may be that we should pack the tree
  objects separately from the blob objects: for git-rev-list --objects, we
  don't actually ever need to even look at the blobs, but since trees and
  blobs are interspersed in the pack-file, we end up not being dense in
  the tree accesses, so we end up looking at more pages than we strictly
  need to.

  So with a 535MB pack-file, it's entirely possible - even likely - that
  most of the remaining RSS is just the mmap of the pack-file itself. We
  don't need to map in _all_ of it, but we do end up mapping a fair
  amount. ]

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Shrink "struct object" a bit</title>
<updated>2006-06-18T01:49:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-06-14T23:45:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=885a86abe2e9f7b96a4e2012183c6751635840aa'/>
<id>urn:sha1:885a86abe2e9f7b96a4e2012183c6751635840aa</id>
<content type='text'>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>tree_entry(): new tree-walking helper function</title>
<updated>2006-05-31T06:03:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-05-30T16:45:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=4c068a983150b740c3fcf6a33f342ac923abd3f4'/>
<id>urn:sha1:4c068a983150b740c3fcf6a33f342ac923abd3f4</id>
<content type='text'>
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree-&gt;buffer;
	desc.size = tree-&gt;size;
	while (tree_entry(&amp;desc, &amp;entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Remove last vestiges of generic tree_entry_list</title>
<updated>2006-05-30T02:08:37Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-05-29T19:21:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=15b5536ee47c6684806edd7725adbbdede9fb95c'/>
<id>urn:sha1:15b5536ee47c6684806edd7725adbbdede9fb95c</id>
<content type='text'>
The old tree_entry_list is dead, long live the unified single tree
parser.

Yes, we now still have a compatibility function to create a bogus
tree_entry_list in builtin-read-tree.c, but that is now entirely local
to that very messy piece of code.

I'd love to clean read-tree.c up too, but I'm too scared right now, so
the best I can do is to just contain the damage, and try to make sure
that no new users of the tree_entry_list sprout up by not having it as
an exported interface any more.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Remove unused "zeropad" entry from tree_list_entry</title>
<updated>2006-05-30T02:08:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-05-29T19:19:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=3bc1eca91e5230739cfb488e63fae35a166a07de'/>
<id>urn:sha1:3bc1eca91e5230739cfb488e63fae35a166a07de</id>
<content type='text'>
That was a hack, only needed because 'git fsck-objects' didn't look at
the raw tree format.  Now that fsck traverses the tree itself, we can
drop it.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Remove "tree-&gt;entries" tree-entry list from tree parser</title>
<updated>2006-05-30T02:06:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-05-29T19:18:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=2d9c58c69d1bab601e67b036d0546e85abcee7eb'/>
<id>urn:sha1:2d9c58c69d1bab601e67b036d0546e85abcee7eb</id>
<content type='text'>
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Switch "read_tree_recursive()" over to tree-walk functionality</title>
<updated>2006-05-30T02:05:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-05-29T19:17:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/git.git/commit/?id=0790a42a502701c7b58e9ad4123e46bf46bbf319'/>
<id>urn:sha1:0790a42a502701c7b58e9ad4123e46bf46bbf319</id>
<content type='text'>
Don't use the tree_entry list any more.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
</feed>
