<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup, branch v5.4.189</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.189</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.189'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-04-15T12:18:41Z</updated>
<entry>
<title>cgroup: Use open-time cgroup namespace for process migration perm checks</title>
<updated>2022-04-15T12:18:41Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-04-14T08:44:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8a887060af61b451c46938149c426defe16add77'/>
<id>urn:sha1:8a887060af61b451c46938149c426defe16add77</id>
<content type='text'>
commit e57457641613fef0d147ede8bd6a3047df588b95 upstream.

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.

This also fixes a use-after-free bug on cgroupns reported in

 https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com

Note that backporting this fix also requires the preceding patch.

Reported-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes: 5136f6365ce3 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
[mkoutny: v5.10: duplicate ns check in procs/threads write handler, adjust context]
Signed-off-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[OP: backport to v5.4: drop changes to cgroup_attach_permissions() and
cgroup_css_set_fork(), adjust cgroup_procs_write_permission() calls]
Signed-off-by: Ovidiu Panait &lt;ovidiu.panait@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Allocate cgroup_file_ctx for kernfs_open_file-&gt;priv</title>
<updated>2022-04-15T12:18:41Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-04-14T08:44:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9bd1ced6466e71dcb08b24b59b8dd87bb2369d07'/>
<id>urn:sha1:9bd1ced6466e71dcb08b24b59b8dd87bb2369d07</id>
<content type='text'>
commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream.

of-&gt;priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.

Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.

v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
    cgroup_file_ctx as suggested by Linus.

v3: Michal pointed out that cgroup1's procs pidlist uses of-&gt;priv too.
    Converted. Didn't change to embedded allocation as cgroup1 pidlists get
    stored for caching.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
[mkoutny: v5.10: modify cgroup.pressure handlers, adjust context]
Signed-off-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Ovidiu Panait &lt;ovidiu.panait@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Use open-time credentials for process migraton perm checks</title>
<updated>2022-04-15T12:18:41Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-04-14T08:44:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=691a0fd625e06c138f7662286a87ffba48773f34'/>
<id>urn:sha1:691a0fd625e06c138f7662286a87ffba48773f34</id>
<content type='text'>
commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream.

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file-&gt;f_cred) instead of
current's.

Reported-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Fixes: 187fe84067bd ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
[OP: backport to 5.4: apply original __cgroup_procs_write() changes to
cgroup_threads_write() and cgroup_procs_write()]
Signed-off-by: Ovidiu Panait &lt;ovidiu.panait@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug</title>
<updated>2022-03-02T10:41:00Z</updated>
<author>
<name>Zhang Qiao</name>
<email>zhangqiao22@huawei.com</email>
</author>
<published>2022-01-21T10:12:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1f1de5dff78f8baeef948fedde00bfffc0dd7c4'/>
<id>urn:sha1:d1f1de5dff78f8baeef948fedde00bfffc0dd7c4</id>
<content type='text'>
commit 05c7b7a92cc87ff8d7fde189d0fade250697573c upstream.

As previously discussed(https://lkml.org/lkml/2022/1/20/51),
cpuset_attach() is affected with similar cpu hotplug race,
as follow scenario:

     cpuset_attach()				cpu hotplug
    ---------------------------            ----------------------
    down_write(cpuset_rwsem)
    guarantee_online_cpus() // (load cpus_attach)
					sched_cpu_deactivate
					  set_cpu_active()
					  // will change cpu_active_mask
    set_cpus_allowed_ptr(cpus_attach)
      __set_cpus_allowed_ptr_locked()
       // (if the intersection of cpus_attach and
         cpu_active_mask is empty, will return -EINVAL)
    up_write(cpuset_rwsem)

To avoid races such as described above, protect cpuset_attach() call
with cpu_hotplug_lock.

Fixes: be367d099270 ("cgroups: let ss-&gt;can_attach and ss-&gt;attach do whole threadgroups at a time")
Cc: stable@vger.kernel.org # v2.6.32+
Reported-by: Zhao Gongyi &lt;zhaogongyi@huawei.com&gt;
Signed-off-by: Zhang Qiao &lt;zhangqiao22@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning</title>
<updated>2022-02-08T17:24:35Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-02-03T03:31:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed339069725a44633bd7922434176fd43a9c21ac'/>
<id>urn:sha1:ed339069725a44633bd7922434176fd43a9c21ac</id>
<content type='text'>
commit 2bdfd2825c9662463371e6691b1a794e97fa36b4 upstream.

It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks().  It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.

Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.

Fixes: 4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Tested-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()</title>
<updated>2022-02-05T11:35:37Z</updated>
<author>
<name>Tianchen Ding</name>
<email>dtcccc@linux.alibaba.com</email>
</author>
<published>2022-01-18T10:05:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=db6c57d2666d4da4f033253f678e0b3df8b65fe1'/>
<id>urn:sha1:db6c57d2666d4da4f033253f678e0b3df8b65fe1</id>
<content type='text'>
commit c80d401c52a2d1baf2a5afeb06f0ffe678e56d23 upstream.

subparts_cpus should be limited as a subset of cpus_allowed, but it is
updated wrongly by using cpumask_andnot(). Use cpumask_and() instead to
fix it.

Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Tianchen Ding &lt;dtcccc@linux.alibaba.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup-v1: Require capabilities to set release_agent</title>
<updated>2022-02-05T11:35:36Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2022-01-20T17:04:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e8283cbe4996ae046cd680b3ed598a8f2b0d5d8'/>
<id>urn:sha1:0e8283cbe4996ae046cd680b3ed598a8f2b0d5d8</id>
<content type='text'>
commit 24f6008564183aa120d07c03d9289519c2fe02af upstream.

The cgroup release_agent is called with call_usermodehelper.  The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.

Reported-by: Tabitha Sable &lt;tabitha.c.sable@gmail.com&gt;
Tested-by: Tabitha Sable &lt;tabitha.c.sable@gmail.com&gt;
Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>psi: Fix uaf issue when psi trigger is destroyed while being polled</title>
<updated>2022-02-05T11:35:36Z</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2022-01-11T23:23:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2fd752ed77ab9880da927257b73294f29a199f1a'/>
<id>urn:sha1:2fd752ed77ab9880da927257b73294f29a199f1a</id>
<content type='text'>
commit a06247c6804f1a7c86a2e5398a4c1f1db1471848 upstream.

With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger-&gt;event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Analyzed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
[surenb: backported to 5.4 kernel]
CC: stable@vger.kernel.org # 5.4
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Make rebind_subsystems() disable v2 controllers all at once</title>
<updated>2021-11-17T08:48:34Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2021-09-18T22:53:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1713b856345d282a8faee5441662a9127e345936'/>
<id>urn:sha1:1713b856345d282a8faee5441662a9127e345936</id>
<content type='text'>
[ Upstream commit 7ee285395b211cad474b2b989db52666e0430daf ]

It was found that the following warning was displayed when remounting
controllers from cgroup v2 to v1:

[ 8042.997778] WARNING: CPU: 88 PID: 80682 at kernel/cgroup/cgroup.c:3130 cgroup_apply_control_disable+0x158/0x190
   :
[ 8043.091109] RIP: 0010:cgroup_apply_control_disable+0x158/0x190
[ 8043.096946] Code: ff f6 45 54 01 74 39 48 8d 7d 10 48 c7 c6 e0 46 5a a4 e8 7b 67 33 00 e9 41 ff ff ff 49 8b 84 24 e8 01 00 00 0f b7 40 08 eb 95 &lt;0f&gt; 0b e9 5f ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3
[ 8043.115692] RSP: 0018:ffffba8a47c23d28 EFLAGS: 00010202
[ 8043.120916] RAX: 0000000000000036 RBX: ffffffffa624ce40 RCX: 000000000000181a
[ 8043.128047] RDX: ffffffffa63c43e0 RSI: ffffffffa63c43e0 RDI: ffff9d7284ee1000
[ 8043.135180] RBP: ffff9d72874c5800 R08: ffffffffa624b090 R09: 0000000000000004
[ 8043.142314] R10: ffffffffa624b080 R11: 0000000000002000 R12: ffff9d7284ee1000
[ 8043.149447] R13: ffff9d7284ee1000 R14: ffffffffa624ce70 R15: ffffffffa6269e20
[ 8043.156576] FS:  00007f7747cff740(0000) GS:ffff9d7a5fc00000(0000) knlGS:0000000000000000
[ 8043.164663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8043.170409] CR2: 00007f7747e96680 CR3: 0000000887d60001 CR4: 00000000007706e0
[ 8043.177539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8043.184673] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8043.191804] PKRU: 55555554
[ 8043.194517] Call Trace:
[ 8043.196970]  rebind_subsystems+0x18c/0x470
[ 8043.201070]  cgroup_setup_root+0x16c/0x2f0
[ 8043.205177]  cgroup1_root_to_use+0x204/0x2a0
[ 8043.209456]  cgroup1_get_tree+0x3e/0x120
[ 8043.213384]  vfs_get_tree+0x22/0xb0
[ 8043.216883]  do_new_mount+0x176/0x2d0
[ 8043.220550]  __x64_sys_mount+0x103/0x140
[ 8043.224474]  do_syscall_64+0x38/0x90
[ 8043.228063]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It was caused by the fact that rebind_subsystem() disables
controllers to be rebound one by one. If more than one disabled
controllers are originally from the default hierarchy, it means that
cgroup_apply_control_disable() will be called multiple times for the
same default hierarchy. A controller may be killed by css_kill() in
the first round. In the second round, the killed controller may not be
completely dead yet leading to the warning.

To avoid this problem, we collect all the ssid's of controllers that
needed to be disabled from the default hierarchy and then disable them
in one go instead of one by one.

Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Fix a partition bug with hotplug</title>
<updated>2021-09-15T07:47:32Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2021-07-20T14:18:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9fdac650c41314ae02095bdebc5f7bc6ce8a4452'/>
<id>urn:sha1:9fdac650c41314ae02095bdebc5f7bc6ce8a4452</id>
<content type='text'>
[ Upstream commit 15d428e6fe77fffc3f4fff923336036f5496ef17 ]

In cpuset_hotplug_workfn(), the detection of whether the cpu list
has been changed is done by comparing the effective cpus of the top
cpuset with the cpu_active_mask. However, in the rare case that just
all the CPUs in the subparts_cpus are offlined, the detection fails
and the partition states are not updated correctly. Fix it by forcing
the cpus_updated flag to true in this particular case.

Fixes: 4b842da276a8 ("cpuset: Make CPU hotplug work with partition")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
