<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup.c, branch v4.1.18</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.18</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.18'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-09-21T17:05:45Z</updated>
<entry>
<title>fs: create and use seq_show_option for escaping</title>
<updated>2015-09-21T17:05:45Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2015-09-04T22:44:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3b428f0361d6dcbe7c6665ae0a824517a0b1ca9'/>
<id>urn:sha1:d3b428f0361d6dcbe7c6665ae0a824517a0b1ca9</id>
<content type='text'>
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Jan Kara &lt;jack@suse.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: J. R. Okajima &lt;hooanon05g@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sysfs: Create mountpoints with sysfs_create_mount_point</title>
<updated>2015-07-21T17:10:01Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-05-13T22:35:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28dd1f346b2f0fc2ab8285046ed0bd91e9b808d3'/>
<id>urn:sha1:28dd1f346b2f0fc2ab8285046ed0bd91e9b808d3</id>
<content type='text'>
commit f9bb48825a6b5d02f4cabcc78967c75db903dcdc upstream.

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/       s390_hypfs
/sys/kernel/config/         configfs
/sys/kernel/debug/          debugfs
/sys/firmware/efi/efivars/  efivarfs
/sys/fs/fuse/connections/   fusectl
/sys/fs/pstore/             pstore
/sys/kernel/tracing/        tracefs
/sys/fs/cgroup/             cgroup
/sys/kernel/security/       securityfs
/sys/fs/selinux/            selinuxfs
/sys/fs/smackfs/            smackfs

Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: remove use of seq_printf return value</title>
<updated>2015-04-15T23:35:25Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2015-04-15T23:18:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94ff212d096d96afea29e41585b0d8ce41087c0f'/>
<id>urn:sha1:94ff212d096d96afea29e41585b0d8ce41087c0f</id>
<content type='text'>
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: zap mem_cgroup_lookup()</title>
<updated>2015-04-15T23:35:16Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2015-04-15T23:13:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=adbe427b92d18cf3f801e82e9cd664fbe31cea3c'/>
<id>urn:sha1:adbe427b92d18cf3f801e82e9cd664fbe31cea3c</id>
<content type='text'>
mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
checks that id != 0 before issuing the function call.  Today, there is
no point in this additional check apart from optimization, because there
is no css with id &lt;= 0, so that css_from_id, called by
mem_cgroup_from_id, will return NULL for any id &lt;= 0.

Since mem_cgroup_from_id is only called from mem_cgroup_lookup, let us
zap mem_cgroup_lookup, substituting calls to it with mem_cgroup_from_id
and moving the check if id &gt; 0 to css_from_id.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Use kvfree in pidlist_free()</title>
<updated>2015-03-03T13:47:25Z</updated>
<author>
<name>Bandan Das</name>
<email>bsd@redhat.com</email>
</author>
<published>2015-03-02T22:51:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=587945147cfe48c84dd92c022e25aad6d92c0da8'/>
<id>urn:sha1:587945147cfe48c84dd92c022e25aad6d92c0da8</id>
<content type='text'>
The wrapper already calls the appropriate free
function, use it instead of spinning our own.

Signed-off-by: Bandan Das &lt;bsd@redhat.com&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: call cgroup_subsys-&gt;bind on cgroup subsys initialization</title>
<updated>2015-03-02T17:11:01Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2015-02-19T14:34:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=295458e67284f57d154ec8156a22797c0cfb044a'/>
<id>urn:sha1:295458e67284f57d154ec8156a22797c0cfb044a</id>
<content type='text'>
Currently, we call cgroup_subsys-&gt;bind only on unmount, remount, and
when creating a new root on mount. Since the default hierarchy root is
created in cgroup_init, we will not call cgroup_subsys-&gt;bind if the
default hierarchy is freshly mounted. As a result, some controllers will
behave incorrectly (most notably, the "memory" controller will not
enable hierarchy support). Fix this by calling cgroup_subsys-&gt;bind right
after initializing a cgroup subsystem.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernfs: remove KERNFS_STATIC_NAME</title>
<updated>2015-02-14T05:21:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-02-13T22:36:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dfeb0750b630b72b5d4fb2461bc7179eceb54666'/>
<id>urn:sha1:dfeb0750b630b72b5d4fb2461bc7179eceb54666</id>
<content type='text'>
When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
making a separate copy of its name.  It's currently only used for sysfs
attributes whose filenames are required to stay accessible and unchanged.
There are rare exceptions where these names are allocated and formatted
dynamically but for the vast majority of cases they're consts in the
rodata section.

Now that kernfs is converted to use kstrdup_const() and kfree_const(),
there's little point in keeping KERNFS_STATIC_NAME around.  Remove it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andrzej Hajda &lt;a.hajda@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: release css-&gt;id after css_free</title>
<updated>2015-02-13T02:54:09Z</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2015-02-12T22:59:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=01e586598b224d1a427acd8a7afa0b21e879d3a7'/>
<id>urn:sha1:01e586598b224d1a427acd8a7afa0b21e879d3a7</id>
<content type='text'>
Currently, we release css-&gt;id in css_release_work_fn, right before calling
css_free callback, so that when css_free is called, the id may have
already been reused for a new cgroup.

I am going to use css-&gt;id to create unique names for per memcg kmem
caches.  Since kmem caches are destroyed only on css_free, I need css-&gt;id
to be freed after css_free was called to avoid name clashes.  This patch
therefore moves css-&gt;id removal to css_free_work_fn.  To prevent
css_from_id from returning a pointer to a stale css, it makes
css_release_work_fn replace the css ptr at css_idr:css-&gt;id with NULL.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: prevent mount hang due to memory controller lifetime</title>
<updated>2015-01-22T15:26:43Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2015-01-22T15:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c606d35fe9766ede86a45273a628b320be13b45'/>
<id>urn:sha1:3c606d35fe9766ede86a45273a628b320be13b45</id>
<content type='text'>
Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose" &lt;Suzuki.Poulose@arm.com&gt;
Reported-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: implement cgroup_get_e_css()</title>
<updated>2014-11-18T07:49:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-11-18T07:49:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eeecbd1971517103e06f11750dd1a9a1dc37e4e6'/>
<id>urn:sha1:eeecbd1971517103e06f11750dd1a9a1dc37e4e6</id>
<content type='text'>
Implement cgroup_get_e_css() which finds and gets the effective css
for the specified cgroup and subsystem combination.  This function
always returns a valid pinned css.  This will be used by cgroup
writeback support.

While at it, add comment to cgroup_e_css() to explain why that
function is different from cgroup_get_e_css() and has to test
cgrp-&gt;child_subsys_mask instead of cgroup_css(cgrp, ss).

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
</entry>
</feed>
