<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup.c, branch v2.6.26.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.26.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.26.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2008-10-09T03:23:12Z</updated>
<entry>
<title>mm owner: fix race between swapoff and exit</title>
<updated>2008-10-09T03:23:12Z</updated>
<author>
<name>Balbir Singh</name>
<email>balbir@linux.vnet.ibm.com</email>
</author>
<published>2008-10-05T16:43:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=553d7dd7336a3c1f3dd12085b5c42451c17225e1'/>
<id>urn:sha1:553d7dd7336a3c1f3dd12085b5c42451c17225e1</id>
<content type='text'>
[Here's a backport of 2.6.27-rc8's 31a78f23bac0069004e69f98808b6988baccb6b6
 to 2.6.26 or 2.6.26.5: I wouldn't trouble -stable for the (root only)
 swapoff case which uncovered the bug, but the /proc/&lt;pid&gt;/&lt;mmstats&gt; case
 is open to all, so I think worth plugging in the next 2.6.26-stable.
 - Hugh]


There's a race between mm-&gt;owner assignment and swapoff, more easily
seen when task slab poisoning is turned on.  The condition occurs when
try_to_unuse() runs in parallel with an exiting task.  A similar race
can occur with callers of get_task_mm(), such as /proc/&lt;pid&gt;/&lt;mmstats&gt;
or ptrace or page migration.

CPU0                                    CPU1
                                        try_to_unuse
                                        looks at mm = task0-&gt;mm
                                        increments mm-&gt;mm_users
task 0 exits
mm-&gt;owner needs to be updated, but no
new owner is found (mm_users &gt; 1, but
no other task has task-&gt;mm = task0-&gt;mm)
mm_update_next_owner() leaves
                                        mmput(mm) decrements mm-&gt;mm_users
task0 freed
                                        dereferencing mm-&gt;owner fails

The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.

Jiri Slaby:
mm-&gt;owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.

Daisuke Nishimura:
mm_update_next_owner() may set mm-&gt;owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.

Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same.  And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.

Reported-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jirislaby@gmail.com&gt;
Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>cgroups: remove node_ prefix_from ns subsystem</title>
<updated>2008-05-24T16:56:14Z</updated>
<author>
<name>Cedric Le Goater</name>
<email>clg@fr.ibm.com</email>
</author>
<published>2008-05-23T20:05:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c02b575780d0d785815a1e7b79a98edddee895a'/>
<id>urn:sha1:5c02b575780d0d785815a1e7b79a98edddee895a</id>
<content type='text'>
This is a slight change in the namespace cgroup subsystem api.

The change is that previously when cgroup_clone() was called (currently
only from the unshare path in ns_proxy cgroup, you'd get a new group named
"node_$pid" whereas now you'll get a group named after just your pid.)

The only users who would notice it are those who are using the ns_proxy
cgroup subsystem to auto-create cgroups when namespaces are unshared -
something of an experimental feature, which I think really needs more
complete container/namespace support in order to be useful.  I suspect the
only users are Cedric and Serge, or maybe a few others on
containers@lists.linux-foundation.org.  And in fact it would only be
noticed by the users who make the assumption about how the name is
generated, rather than getting it from the /proc/&lt;pid&gt;/cgroups file for
the process in question.

Whether the change is actually needed or not I'm fairly agnostic on, but I
guess it is more elegant to just use the pid as the new group name rather
than adding a fairly arbitrary "node_" prefix on the front.

[menage@google.com: provided changelog]
Signed-off-by: Cedric Le Goater &lt;clg@fr.ibm.com&gt;
Cc: "Paul Menage" &lt;menage@google.com&gt;
Cc: "Serge E. Hallyn" &lt;serue@us.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: bdi: add separate writeback accounting capability</title>
<updated>2008-04-30T15:29:50Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2008-04-30T07:54:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e4ad08fe64afca4ef79ecc4c624e6e871688da0d'/>
<id>urn:sha1:e4ad08fe64afca4ef79ecc4c624e6e871688da0d</id>
<content type='text'>
Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB.  If this flag is
set, then don't update the per-bdi writeback stats from
test_set_page_writeback() and test_clear_page_writeback().

Misc cleanups:

 - convert bdi_cap_writeback_dirty() and friends to static inline functions
 - create a flag that includes all three dirty/writeback related flags,
   since almst all users will want to have them toghether

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: add an owner to the mm_struct</title>
<updated>2008-04-29T15:06:10Z</updated>
<author>
<name>Balbir Singh</name>
<email>balbir@linux.vnet.ibm.com</email>
</author>
<published>2008-04-29T08:00:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf475ad28ac35cc9ba612d67158f29b73b38b05d'/>
<id>urn:sha1:cf475ad28ac35cc9ba612d67158f29b73b38b05d</id>
<content type='text'>
Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage.  The advantage of this approach
is that, once the mm-&gt;owner is known, using the subsystem id, the cgroup
can be determined.  It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm-&gt;owner
changes.  The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm-&gt;owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelianov &lt;xemul@openvz.org&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Sudhir Kumar &lt;skumar@linux.vnet.ibm.com&gt;
Cc: YAMAMOTO Takashi &lt;yamamoto@valinux.co.jp&gt;
Cc: Hirokazu Takahashi &lt;taka@valinux.co.jp&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;,
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Reviewed-by: Paul Menage &lt;menage@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: introduce cft-&gt;read_seq()</title>
<updated>2008-04-29T15:06:10Z</updated>
<author>
<name>Serge E. Hallyn</name>
<email>serue@us.ibm.com</email>
</author>
<published>2008-04-29T08:00:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=29486df325e1fe6e1764afcb19e3370804c2b002'/>
<id>urn:sha1:29486df325e1fe6e1764afcb19e3370804c2b002</id>
<content type='text'>
Introduce a read_seq() helper in cftype, which uses seq_file to print out
lists.  Use it in the devices cgroup.  Also split devices.allow into two
files, so now devices.deny and devices.allow are the ones to use to manipulate
the whitelist, while devices.list outputs the cgroup's current whitelist.

Signed-off-by: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: remove the css_set linked-list</title>
<updated>2008-04-29T15:06:10Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-04-29T08:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28fd5dfc12bde391981dfdcf20755952b6e916af'/>
<id>urn:sha1:28fd5dfc12bde391981dfdcf20755952b6e916af</id>
<content type='text'>
Now we can run through the hash table instead of running through the
linked-list.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Reviewed-by: Paul Menage &lt;menage@google.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: simplify init_subsys()</title>
<updated>2008-04-29T15:06:10Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-04-29T08:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8d55fdeb882cfcb5e8db5a5ce16edfba78aafc5'/>
<id>urn:sha1:e8d55fdeb882cfcb5e8db5a5ce16edfba78aafc5</id>
<content type='text'>
We are at system boot and there is only 1 cgroup group (i,e, init_css_set), so
we don't need to run through the css_set linked list.  Neither do we need to
run through the task list, since no processes have been created yet.

Also referring to a comment in cgroup.h:

struct css_set
{
	...
	/*
	 * Set of subsystem states, one for each subsystem. This array
	 * is immutable after creation apart from the init_css_set
	 * during subsystem registration (at boot time).
	 */
	struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
}

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Reviewed-by: Paul Menage &lt;menage@google.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: use a hash table for css_set finding</title>
<updated>2008-04-29T15:06:09Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-04-29T08:00:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=472b1053f3c319cc60bfb2a0bb062fed77a93eb6'/>
<id>urn:sha1:472b1053f3c319cc60bfb2a0bb062fed77a93eb6</id>
<content type='text'>
When we attach a process to a different cgroup, the css_set linked-list will
be run through to find a suitable existing css_set to use.  This patch
implements a hash table for better performance.

The following benchmarks have been tested:

For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping
task in each, and then move an additional task through each cgroup in
turn.

Here is a test result:

N	Loop	orig - Time(s)	hash - Time(s)
----------------------------------------------
1	10000	1.201231728	1.196311177
5	2000	1.065743872	1.040566424
10	1000	0.991054735	0.986876440
50	200	0.976554203	0.969608733
100	100	0.998504680	0.969218270
500	20	1.157347764	0.962602963
1000	10	1.619521852	1.085140172

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Reviewed-by: Paul Menage &lt;menage@google.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: add the trigger callback to struct cftype</title>
<updated>2008-04-29T15:06:09Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2008-04-29T08:00:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d447ea2f30ec60370ddb99a668e5ac12995f043d'/>
<id>urn:sha1:d447ea2f30ec60370ddb99a668e5ac12995f043d</id>
<content type='text'>
Trigger callback can be used to receive a kick-up from the user space.  The
string written is ignored.

The cftype-&gt;private is used for multiplexing events.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: switch to proc_create()</title>
<updated>2008-04-29T15:06:09Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2008-04-29T08:00:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=46ae220bea40bd1cf4abec2d5cdfb4f9396c7115'/>
<id>urn:sha1:46ae220bea40bd1cf4abec2d5cdfb4f9396c7115</id>
<content type='text'>
There is a race between create_proc_entry() and the assignment of file ops.
proc_create() is invented to fix it.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
