<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup.c, branch v3.16.52</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.52</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.52'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-02-25T00:18:35Z</updated>
<entry>
<title>cgroup: make sure a parent css isn't offlined before its children</title>
<updated>2016-02-25T00:18:35Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-01-21T20:31:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9892cc2d91b31ea58c775531813a21dc24e50d0'/>
<id>urn:sha1:d9892cc2d91b31ea58c775531813a21dc24e50d0</id>
<content type='text'>
commit aa226ff4a1ce79f229c6b7a4c0a14e17fececd01 upstream.

There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free().  Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children.  This behavior is unexpected and led to bugs in cpu and
memory controller.

This patch updates offline path so that a parent css is never offlined
before its children.  Each css keeps online_cnt which reaches zero iff
itself and all its children are offline and offline_css() is invoked
only after online_cnt reaches zero.

This fixes the memory controller bug and allows the fix for cpu
controller.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Reported-by: Brian Christiansen &lt;brian.o.christiansen@gmail.com&gt;
Link: http://lkml.kernel.org/g/5698A023.9070703@de.ibm.com
Link: http://lkml.kernel.org/g/CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
[ luis: backported to 3.16:
  - file rename: cgroup-defs.h -&gt; cgroup.h
  - adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>fs: create and use seq_show_option for escaping</title>
<updated>2015-09-29T15:44:47Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2015-09-04T22:44:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7cf259c95caa9de7238fafeb0cd252e445b42c07'/>
<id>urn:sha1:7cf259c95caa9de7238fafeb0cd252e445b42c07</id>
<content type='text'>
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Jan Kara &lt;jack@suse.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: J. R. Okajima &lt;hooanon05g@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[ luis: backported to 3.16:
  - dropped changes to fs/overlayfs/super.c, net/ceph/ceph_common.c and to
    cgroup_show_options() in kernel/cgroup.c
  - adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>sysfs: Create mountpoints with sysfs_create_mount_point</title>
<updated>2015-08-20T17:24:03Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-08-20T16:52:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b8ad501a4d34e99d8a448a34e144402d9a501b8b'/>
<id>urn:sha1:b8ad501a4d34e99d8a448a34e144402d9a501b8b</id>
<content type='text'>
commit f9bb48825a6b5d02f4cabcc78967c75db903dcdc upstream.

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/       s390_hypfs
/sys/kernel/config/         configfs
/sys/kernel/debug/          debugfs
/sys/firmware/efi/efivars/  efivarfs
/sys/fs/fuse/connections/   fusectl
/sys/fs/pstore/             pstore
/sys/kernel/tracing/        tracefs
/sys/fs/cgroup/             cgroup
/sys/kernel/security/       securityfs
/sys/fs/selinux/            selinuxfs
/sys/fs/smackfs/            smackfs

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Seth Forshee &lt;seth.forshee@canonical.com&gt;
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>cgroup: fix unbalanced locking</title>
<updated>2014-10-05T20:41:03Z</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-18T09:28:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8c35a7604d19cd4f9b3ff4015faab8acb7b782c'/>
<id>urn:sha1:e8c35a7604d19cd4f9b3ff4015faab8acb7b782c</id>
<content type='text'>
commit eb4aec84d6bdf98d00cedb41c18000f7a31e648a upstream.

cgroup_pidlist_start() holds cgrp-&gt;pidlist_mutex and then calls
pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.

It is wrong that we release the mutex in the failure path in
pidlist_array_load(), because cgroup_pidlist_stop() will be called
no matter if cgroup_pidlist_start() returns errno or not.

Fixes: 4bac00d16a8760eae7205e41d2c246477d42a210
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: delay the clearing of cgrp-&gt;kn-&gt;priv</title>
<updated>2014-10-05T20:41:03Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-04T06:43:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7d73a89ba1408035381e19a0e890a4aaf15d0ec5'/>
<id>urn:sha1:7d73a89ba1408035381e19a0e890a4aaf15d0ec5</id>
<content type='text'>
commit a4189487da1b4f8260c6006b9dc47c3c4107a5ae upstream.

Run these two scripts concurrently:

    for ((; ;))
    {
        mkdir /cgroup/sub
        rmdir /cgroup/sub
    }

    for ((; ;))
    {
        echo $$ &gt; /cgroup/sub/cgroup.procs
        echo $$ &gt; /cgroup/cgroup.procs
    }

A kernel bug will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 00000038
IP: [&lt;c10bbd69&gt;] cgroup_put+0x9/0x80
...
Call Trace:
 [&lt;c10bbe19&gt;] cgroup_kn_unlock+0x39/0x50
 [&lt;c10bbe91&gt;] cgroup_kn_lock_live+0x61/0x70
 [&lt;c10be3c1&gt;] __cgroup_procs_write.isra.26+0x51/0x230
 [&lt;c10be5b2&gt;] cgroup_tasks_write+0x12/0x20
 [&lt;c10bb7b0&gt;] cgroup_file_write+0x40/0x130
 [&lt;c11aee71&gt;] kernfs_fop_write+0xd1/0x160
 [&lt;c1148e58&gt;] vfs_write+0x98/0x1e0
 [&lt;c114934d&gt;] SyS_write+0x4d/0xa0
 [&lt;c16f656b&gt;] sysenter_do_call+0x12/0x12

We clear cgrp-&gt;kn-&gt;priv in the end of cgroup_rmdir(), but another
concurrent thread can access kn-&gt;priv after the clearing.

We should move the clearing to css_release_work_fn(). At that time
no one is holding reference to the cgroup and no one can gain a new
reference to access it.

v2:
- move RCU_INIT_POINTER() into the else block. (Tejun)
- remove the cgroup_parent() check. (Tejun)
- update the comment in css_tryget_online_from_dir().

Reported-by: Toralf Förster &lt;toralf.foerster@gmx.de&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: reject cgroup names with '\n'</title>
<updated>2014-10-05T20:41:03Z</updated>
<author>
<name>Alban Crequy</name>
<email>alban.crequy@collabora.co.uk</email>
</author>
<published>2014-08-18T11:20:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=767f5ccd772e406f493e85edcb3503e4ef8d1ca7'/>
<id>urn:sha1:767f5ccd772e406f493e85edcb3503e4ef8d1ca7</id>
<content type='text'>
commit 71b1fb5c4473a5b1e601d41b109bdfe001ec82e0 upstream.

/proc/&lt;pid&gt;/cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc/&lt;pid&gt;/cgroup safely.

Signed-off-by: Alban Crequy &lt;alban.crequy@collabora.co.uk&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: check cgroup liveliness before unbreaking kernfs</title>
<updated>2014-10-05T20:41:00Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-04T06:43:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d5cd32aec76ad53e2aefc72ee67fed7d2819019'/>
<id>urn:sha1:3d5cd32aec76ad53e2aefc72ee67fed7d2819019</id>
<content type='text'>
commit aa32362f011c6e863132b16c1761487166a4bad2 upstream.

When cgroup_kn_lock_live() is called through some kernfs operation and
another thread is calling cgroup_rmdir(), we'll trigger the warning in
cgroup_get().

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
...
Call Trace:
 [&lt;c16ee73d&gt;] dump_stack+0x41/0x52
 [&lt;c10468ef&gt;] warn_slowpath_common+0x7f/0xa0
 [&lt;c104692d&gt;] warn_slowpath_null+0x1d/0x20
 [&lt;c10bb999&gt;] cgroup_get+0x89/0xa0
 [&lt;c10bbe58&gt;] cgroup_kn_lock_live+0x28/0x70
 [&lt;c10be3c1&gt;] __cgroup_procs_write.isra.26+0x51/0x230
 [&lt;c10be5b2&gt;] cgroup_tasks_write+0x12/0x20
 [&lt;c10bb7b0&gt;] cgroup_file_write+0x40/0x130
 [&lt;c11aee71&gt;] kernfs_fop_write+0xd1/0x160
 [&lt;c1148e58&gt;] vfs_write+0x98/0x1e0
 [&lt;c114934d&gt;] SyS_write+0x4d/0xa0
 [&lt;c16f656b&gt;] sysenter_do_call+0x12/0x12
---[ end trace 6f2e0c38c2108a74 ]---

Fix this by calling css_tryget() instead of cgroup_get().

v2:
- move cgroup_tryget() right below cgroup_get() definition. (Tejun)

Reported-by: Toralf Förster &lt;toralf.foerster@gmx.de&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()</title>
<updated>2014-06-30T14:16:26Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-06-30T03:50:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3a32bd72d77058d768dbb38183ad517f720dd1bc'/>
<id>urn:sha1:3a32bd72d77058d768dbb38183ad517f720dd1bc</id>
<content type='text'>
We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.

Run two instances of this script concurrently:

    for ((; ;))
    {
    	mount -t cgroup -o cpuacct xxx /cgroup
    	umount /cgroup
    }

After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.

This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().

To fix this, we try to increase both the refcnt of the superblock and the
percpu refcnt of cgroup root.

v2:
- we should try to get both the superblock refcnt and cgroup_root refcnt,
  because cgroup_root may have no superblock assosiated with it.
- adjust/add comments.

tj: Updated comments.  Renamed @sb to @pinned_sb.

Cc: &lt;stable@vger.kernel.org&gt; # 3.15
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix mount failure in a corner case</title>
<updated>2014-06-30T14:16:25Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-06-30T03:49:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=970317aa48c6ef66cd023c039c2650c897bad927'/>
<id>urn:sha1:970317aa48c6ef66cd023c039c2650c897bad927</id>
<content type='text'>
  # cat test.sh
  #! /bin/bash

  mount -t cgroup -o cpu xxx /cgroup
  umount /cgroup

  mount -t cgroup -o cpu,cpuacct xxx /cgroup
  umount /cgroup
  # ./test.sh
  mount: xxx already mounted or /cgroup busy
  mount: according to mtab, xxx is already mounted on /cgroup

It's because the cgroupfs_root of the first mount was under destruction
asynchronously.

Fix this by delaying and then retrying mount for this case.

v3:
- put the refcnt immediately after getting it. (Tejun)

v2:
- use percpu_ref_tryget_live() rather that introducing
  percpu_ref_alive(). (Tejun)
- adjust comment.

tj: Updated the comment a bit.

Cc: &lt;stable@vger.kernel.org&gt; # 3.15
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix broken css_has_online_children()</title>
<updated>2014-06-17T22:52:53Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-06-12T06:31:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=99bae5f94185c2cc65701e95c54e31e2f4345b88'/>
<id>urn:sha1:99bae5f94185c2cc65701e95c54e31e2f4345b88</id>
<content type='text'>
After running:

  # mount -t cgroup cpu xxx /cgroup &amp;&amp; mkdir /cgroup/sub &amp;&amp; \
    rmdir /cgroup/sub &amp;&amp; umount /cgroup

I found the cgroup root still existed:

  # cat /proc/cgroups
  #subsys_name    hierarchy       num_cgroups     enabled
  cpuset  0       1       1
  cpu     1       1       1
  ...

It turned out css_has_online_children() is broken.

Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Sigend-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
