<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup.c, branch v2.6.30.1</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.30.1</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.30.1'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2009-05-09T14:49:40Z</updated>
<entry>
<title>Convert obvious places to deactivate_locked_super()</title>
<updated>2009-05-09T14:49:40Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2009-05-06T05:34:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f5bbff9a1b7d6864a495763448a363bbfa96324'/>
<id>urn:sha1:6f5bbff9a1b7d6864a495763448a363bbfa96324</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>memcg: fix OOM killer under memcg</title>
<updated>2009-04-03T02:04:55Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2009-04-02T23:57:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b7f569e45bb6be142d87017030669a6a7d327a1'/>
<id>urn:sha1:0b7f569e45bb6be142d87017030669a6a7d327a1</id>
<content type='text'>
This patch tries to fix OOM Killer problems caused by hierarchy.
Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
kill a task in memcg.

But, when hierarchy is used, it's broken and correct task cannot
be killed. For example, in following cgroup

	/groupA/	hierarchy=1, limit=1G,
		01	nolimit
		02	nolimit
All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
groupA's 1Gbytes but OOM Killer just kills tasks in groupA.

This patch provides makes the bad process be selected from all tasks
under hierarchy. BTW, currently, oom_jiffies is updated against groupA
in above case. oom_jiffies of tree should be updated.

To see how oom_jiffies is used, please check mem_cgroup_oom_called()
callers.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: const fix]
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: don't change release_agent when remount failed</title>
<updated>2009-04-03T02:04:54Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2009-04-02T23:57:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0670e08bdfc67272f8c3087030417465629b8073'/>
<id>urn:sha1:0670e08bdfc67272f8c3087030417465629b8073</id>
<content type='text'>
Remount can fail in either case:
  - wrong mount options is specified, or option 'noprefix' is changed.
  - a to-be-added subsys is already mounted/active.

When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:

 # mount -t cgroup -o cpu xxx /cgrp1
 # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
 # cat /cgrp2/release_agent
 agent1
 # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 not mounted already, or bad option
 # cat /cgrp2/release_agent
 agent1     &lt;-- ok
 # mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
 mount: /cgrp2 is busy
 # cat /cgrp2/release_agent
 agent2     &lt;-- unexpected!

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: show correct file mode</title>
<updated>2009-04-03T02:04:54Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2009-04-02T23:57:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=099fca3225b39f7a3ed853036038054172b55581'/>
<id>urn:sha1:099fca3225b39f7a3ed853036038054172b55581</id>
<content type='text'>
We have some read-only files and write-only files, but currently they are
all set to 0644, which is counter-intuitive and cause trouble for some
cgroup tools like libcgroup.

This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
own files' file mode, and for the most cases cft-&gt;mode can be default to 0
and cgroup will figure out proper mode.

Acked-by: Paul Menage &lt;menage@google.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/cgroup.c: kfree(NULL) is legal</title>
<updated>2009-04-03T02:04:54Z</updated>
<author>
<name>Jesper Juhl</name>
<email>jj@chaosbits.net</email>
</author>
<published>2009-04-02T23:57:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=66bdc9cfc77ba89a9ee6c82d28375b646ab4bb1d'/>
<id>urn:sha1:66bdc9cfc77ba89a9ee6c82d28375b646ab4bb1d</id>
<content type='text'>
Reduces object file size a bit:

Before:
$ size kernel/cgroup.o
   text    data     bss     dec     hex filename
  21593    7804    4924   34321    8611 kernel/cgroup.o
After:
$ size kernel/cgroup.o
   text    data     bss     dec     hex filename
  21537    7744    4924   34205    859d kernel/cgroup.o

Signed-off-by: Jesper Juhl &lt;jj@chaosbits.net&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix frequent -EBUSY at rmdir</title>
<updated>2009-04-03T02:04:54Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2009-04-02T23:57:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ec64f51545fffbc4cb968f0cea56341a4b07e85a'/>
<id>urn:sha1:ec64f51545fffbc4cb968f0cea56341a4b07e85a</id>
<content type='text'>
In following situation, with memory subsystem,

	/groupA use_hierarchy==1
		/01 some tasks
		/02 some tasks
		/03 some tasks
		/04 empty

When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
is triggered and the kernel walks tree under groupA. In this case,
rmdir /groupA/04 fails with -EBUSY frequently because of temporal
refcnt from the kernel.

In general. cgroup can be rmdir'd if there are no children groups and
no tasks. Frequent fails of rmdir() is not useful to users.
(And the reason for -EBUSY is unknown to users.....in most cases)

This patch tries to modify above behavior, by
	- retries if css_refcnt is got by someone.
	- add "return value" to pre_destroy() and allows subsystem to
	  say "we're really busy!"

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: CSS ID support</title>
<updated>2009-04-03T02:04:53Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2009-04-02T23:57:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=38460b48d06440de46b34cb778bd6c4855030754'/>
<id>urn:sha1:38460b48d06440de46b34cb778bd6c4855030754</id>
<content type='text'>
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.

This patch attaches unique ID to each css and provides following.

 - css_lookup(subsys, id)
   returns pointer to struct cgroup_subysys_state of id.
 - css_get_next(subsys, id, rootid, depth, foundid)
   returns the next css under "root" by scanning

When cgroup_subsys-&gt;use_id is set, an id for css is maintained.

The cgroup framework only parepares
	- css_id of root css for subsys
	- id is automatically attached at creation of css.
	- id is *not* freed automatically. Because the cgroup framework
	  don't know lifetime of cgroup_subsys_state.
	  free_css_id() function is provided. This must be called by subsys.

There are several reasons to develop this.
	- Saving space .... For example, memcg's swap_cgroup is array of
	  pointers to cgroup. But it is not necessary to be very fast.
	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
	  reduce much amount of memory usage.

	- Scanning without lock.
	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
	  css under root can be written without locks.
	  ex)
	  do {
		rcu_read_lock();
		next = cgroup_get_next(subsys, id, root, &amp;found);
		/* check sanity of next here */
		css_tryget();
		rcu_read_unlock();
		id = found + 1
	 } while(...)

Characteristics:
	- Each css has unique ID under subsys.
	- Lifetime of ID is controlled by subsys.
	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
	- Allowed ID is 1-65535, ID 0 is UNUSED ID.

Design Choices:
	- scan-by-ID v.s. scan-by-tree-walk.
	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
	  by following kind of routine.
	  scan -&gt; rest a while(release a lock) -&gt; conitunue from interrupted
	  memcg's hierarchical reclaim does this.

	- When subsys-&gt;use_id is set, # of css in the system is limited to
	  65535.

[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Bharata B Rao &lt;bharata@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups</title>
<updated>2009-04-03T02:04:53Z</updated>
<author>
<name>Grzegorz Nosek</name>
<email>root@localdomain.pl</email>
</author>
<published>2009-04-02T23:57:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=313e924c0852943e67335fad9d2608701f0dfe8e'/>
<id>urn:sha1:313e924c0852943e67335fad9d2608701f0dfe8e</id>
<content type='text'>
The ns_proxy cgroup allows moving processes to child cgroups only one
level deep at a time.  This commit relaxes this restriction and makes it
possible to attach tasks directly to grandchild cgroups, e.g.:

($pid is in the root cgroup)
echo $pid &gt; /cgroup/CG1/CG2/tasks

Previously this operation would fail with -EPERM and would have to be
performed as two steps:
echo $pid &gt; /cgroup/CG1/tasks
echo $pid &gt; /cgroup/CG1/CG2/tasks

Also, the target cgroup no longer needs to be empty to move a task there.

Signed-off-by: Grzegorz Nosek &lt;root@localdomain.pl&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Reviewed-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: simple_set_mnt() should return void</title>
<updated>2009-03-27T18:44:03Z</updated>
<author>
<name>Sukadev Bhattiprolu</name>
<email>sukadev@linux.vnet.ibm.com</email>
</author>
<published>2009-03-04T20:06:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a3ec947c85ec339884b30ef6a08133e9311fdae1'/>
<id>urn:sha1:a3ec947c85ec339884b30ef6a08133e9311fdae1</id>
<content type='text'>
simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail.  For instance, get_sb_single() and get_sb_nodev()
should:

        up_write(sb-&gt;s_unmount);
        deactivate_super(sb);

if simple_set_mnt() fails.

Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Sukadev Bhattiprolu &lt;sukadev@linux.vnet.ibm.com&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>constify dentry_operations: rest</title>
<updated>2009-03-27T18:44:03Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2009-02-20T06:02:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ba13d179e8c24c68eac32b93593a6b10fcd1572'/>
<id>urn:sha1:3ba13d179e8c24c68eac32b93593a6b10fcd1572</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
