user/sven/linux.git/kernel/cgroup.c, branch v2.6.34.6

cgroups: alloc_css_id() increments hierarchy depth

2010-07-05T18:22:46Z

commit 94b3dd0f7bb393d93e84a173b1df9b8b64c83ac4 upstream. Child groups should have a greater depth than their parents. Prior to this change, the parent would incorrectly report zero memory usage for child cgroups when use_hierarchy is enabled. test script: mount -t cgroup none /cgroups -o memory cd /cgroups mkdir cg1 echo 1 > cg1/memory.use_hierarchy mkdir cg1/cg11 echo $$ > cg1/cg11/tasks dd if=/dev/zero of=/tmp/foo bs=1M count=1 echo echo CHILD grep cache cg1/cg11/memory.stat echo echo PARENT grep cache cg1/memory.stat echo $$ > tasks rmdir cg1/cg11 cg1 cd / umount /cgroups Using fae9c79, a recent patch that changed alloc_css_id() depth computation, the parent incorrectly reports zero usage: root@ubuntu:~# ./test 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s CHILD cache 1048576 total_cache 1048576 PARENT cache 0 total_cache 0 With this patch, the parent correctly includes child usage: root@ubuntu:~# ./test 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s CHILD cache 1052672 total_cache 1052672 PARENT cache 0 total_cache 1052672 Signed-off-by: Greg Thelen Acked-by: Paul Menage Acked-by: KAMEZAWA Hiroyuki Acked-by: Li Zefan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

memcg: fix css_is_ancestor() RCU locking

2010-05-12T00:33:42Z

Some callers (in memcontrol.c) calls css_is_ancestor() without rcu_read_lock. Because css_is_ancestor() has to access RCU protected data, it should be under rcu_read_lock(). This makes css_is_ancestor() itself does safe access to RCU protected area. (At least, "root" can have refcnt==0 if it's not an ancestor of "child". So, we need rcu_read_lock().) Signed-off-by: KAMEZAWA Hiroyuki Cc: "Paul E. McKenney" Cc: Daisuke Nishimura Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: fix css_id() RCU locking for real

2010-05-12T00:33:42Z

Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be called under rcu_read_lock()") modifies memcontol.c for fixing RCU check message. But Andrew Morton pointed out that the fix doesn't seems sane and it was just for hidining lockdep messages. This is a patch for do proper things. Checking again, all places, accessing without rcu_read_lock, that commit fixies was intentional.... all callers of css_id() has reference count on it. So, it's not necessary to be under rcu_read_lock(). Considering again, we can use rcu_dereference_check for css_id(). We know css->id is valid if css->refcnt > 0. (css->id never changes and freed after css->refcnt going to be 0.) This patch makes use of rcu_dereference_check() in css_id/depth and remove unnecessary rcu-read-lock added by the commit. Signed-off-by: KAMEZAWA Hiroyuki Cc: "Paul E. McKenney" Cc: Daisuke Nishimura Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroup: Fix an RCU warning in alloc_css_id()

2010-05-04T16:25:00Z

With CONFIG_PROVE_RCU=y, a warning can be triggered: # mount -t cgroup -o memory xxx /mnt # mkdir /mnt/0 ... kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection! ... This is a false-positive. It's safe to directly access parent_css->id. Signed-off-by: Li Zefan Signed-off-by: Paul E. McKenney

cgroup: Fix an RCU warning in cgroup_path()

2010-05-04T16:24:59Z

with CONFIG_PROVE_RCU=y, a warning can be triggered: # mount -t cgroup -o debug xxx /mnt # cat /proc/$$/cgroup ... kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection! ... This is a false-positive, because cgroup_path() can be called with either rcu_read_lock() held or cgroup_mutex held. Signed-off-by: Li Zefan Signed-off-by: Paul E. McKenney

cgroups: remove duplicate include

2010-03-24T23:31:19Z

commit e6a1105b ("cgroups: subsystem module loading interface") and commit c50cc752 ("sched, cgroups: Fix module export") result in duplicate including of module.h Signed-off-by: Li Zefan Acked-by: Paul Menage Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroups: remove events before destroying subsystem state objects

2010-03-12T23:52:37Z

Events should be removed after rmdir of cgroup directory, but before destroying subsystem state objects. Let's take reference to cgroup directory dentry to do that. Signed-off-by: Kirill A. Shutemov Acked-by: KAMEZAWA Hiroyuki Cc: Paul Menage Acked-by: Li Zefan Cc: Balbir Singh Cc: Pavel Emelyanov Cc: Dan Malek Cc: Daisuke Nishimura Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroups: fix race between userspace and kernelspace

2010-03-12T23:52:37Z

Notify userspace about cgroup removing only after rmdir of cgroup directory to avoid race between userspace and kernelspace. eventfd are used to notify about two types of event: - control file-specific, like crossing memory threshold; - cgroup removing. To understand what really happen, userspace can check if the cgroup still exists. To avoid race beetween userspace and kernelspace we have to notify userspace about cgroup removing only after rmdir of cgroup directory. Signed-off-by: Kirill A. Shutemov Reviewed-by: KAMEZAWA Hiroyuki Cc: Paul Menage Acked-by: Li Zefan Cc: Balbir Singh Cc: Pavel Emelyanov Cc: Dan Malek Cc: Daisuke Nishimura Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroup: implement eventfd-based generic API for notifications

2010-03-12T23:52:37Z

This patchset introduces eventfd-based API for notifications in cgroups and implements memory notifications on top of it. It uses statistics in memory controler to track memory usage. Output of time(1) on building kernel on tmpfs: Root cgroup before changes: make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total Non-root cgroup before changes: make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total Root cgroup after changes (0 thresholds): make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total Non-root cgroup after changes (0 thresholds): make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total Root cgroup after changes (1 thresholds, never crossed): make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total Non-root cgroup after changes (1 thresholds, never crossed): make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total This patch: Introduce the write-only file "cgroup.event_control" in every cgroup. To register new notification handler you need: - create an eventfd; - open a control file to be monitored. Callbacks register_event() and unregister_event() must be defined for the control file; - write " " to cgroup.event_control. Interpretation of args is defined by control file implementation; eventfd will be woken up by control file implementation or when the cgroup is removed. To unregister notification handler just close eventfd. If you need notification functionality for a control file you have to implement callbacks register_event() and unregister_event() in the struct cftype. [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix] Signed-off-by: Kirill A. Shutemov Reviewed-by: KAMEZAWA Hiroyuki Paul Menage Cc: Li Zefan Cc: Balbir Singh Cc: Pavel Emelyanov Cc: Dan Malek Cc: Vladislav Buzov Cc: Daisuke Nishimura Cc: Alexander Shishkin Cc: Davide Libenzi Signed-off-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

cgroups: clean up cgroup_pidlist_find() a bit

2010-03-12T23:52:36Z

Don't call get_pid_ns() before we locate/alloc the ns. Signed-off-by: Li Zefan Cc: Serge Hallyn Acked-by: Paul Menage Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds