<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup, branch v6.1.45</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.45</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.45'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-07-27T06:50:38Z</updated>
<entry>
<title>sched/psi: use kernfs polling functions for PSI trigger polling</title>
<updated>2023-07-27T06:50:38Z</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2023-06-30T00:56:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=92cc0153324b6ae8577a39f5bf2cd83c9a34ea6a'/>
<id>urn:sha1:92cc0153324b6ae8577a39f5bf2cd83c9a34ea6a</id>
<content type='text'>
[ Upstream commit aff037078ecaecf34a7c2afab1341815f90fba5e ]

Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:

do_select
  vfs_poll
                           do_rmdir
                             cgroup_rmdir
                               kernfs_drain_open_files
                                 cgroup_file_release
                                   cgroup_pressure_release
                                     psi_trigger_destroy
                                       wake_up_pollfree(&amp;t-&gt;event_wait)
// vfs_poll is unblocked
                                       synchronize_rcu
                                       kfree(t)
  poll_freewait -&gt; UAF access to the trigger's waitqueue head

Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node-&gt;poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.

[1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin &lt;lujialin4@huawei.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Allow unprivileged polling of N*2s period</title>
<updated>2023-07-27T06:50:38Z</updated>
<author>
<name>Domenico Cerasuolo</name>
<email>cerasuolodomenico@gmail.com</email>
</author>
<published>2023-03-30T10:54:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d5dca1977685c3ec7ee7490e8f6736e35ca2ee70'/>
<id>urn:sha1:d5dca1977685c3ec7ee7490e8f6736e35ca2ee70</id>
<content type='text'>
[ Upstream commit d82caa273565b45fcf103148950549af76c314b0 ]

PSI offers 2 mechanisms to get information about a specific resource
pressure. One is reading from /proc/pressure/&lt;resource&gt;, which gives
average pressures aggregated every 2s. The other is creating a pollable
fd for a specific resource and cgroup.

The trigger creation requires CAP_SYS_RESOURCE, and gives the
possibility to pick specific time window and threshold, spawing an RT
thread to aggregate the data.

Systemd would like to provide containers the option to monitor pressure
on their own cgroup and sub-cgroups. For example, if systemd launches a
container that itself then launches services, the container should have
the ability to poll() for pressure in individual services. But neither
the container nor the services are privileged.

This patch implements a mechanism to allow unprivileged users to create
pressure triggers. The difference with privileged triggers creation is
that unprivileged ones must have a time window that's a multiple of 2s.
This is so that we can avoid unrestricted spawning of rt threads, and
use instead the same aggregation mechanism done for the averages, which
runs independently of any triggers.

Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/r/20230330105418.77061-5-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()</title>
<updated>2023-06-28T09:12:25Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2023-06-11T13:48:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0d7a4e6589a02fd87d286f7f0713ba6d8fc344e6'/>
<id>urn:sha1:0d7a4e6589a02fd87d286f7f0713ba6d8fc344e6</id>
<content type='text'>
commit f0cc749254d12c78e93dae3b27b21dc9546843d0 upstream.

syzbot is again reporting circular locking dependency between
cpu_hotplug_lock and freezer_mutex. Do like what we did with
commit 57dcd64c7e036299 ("cgroup,freezer: hold cpu_hotplug_lock
before freezer_mutex").

Reported-by: syzbot &lt;syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com&gt;
Closes: https://syzkaller.appspot.com/bug?extid=2ab700fe1829880a2ec6
Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Tested-by: syzbot &lt;syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com&gt;
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Do not corrupt task iteration when rebinding subsystem</title>
<updated>2023-06-28T09:12:25Z</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2023-06-10T09:26:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b162a18d332bd5ca7ff9170ded674b4cdad31f7'/>
<id>urn:sha1:7b162a18d332bd5ca7ff9170ded674b4cdad31f7</id>
<content type='text'>
commit 6f363f5aa845561f7ea496d8b1175e3204470486 upstream.

We found a refcount UAF bug as follows:

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148
Workqueue: events cpuset_hotplug_workfn
Call trace:
 refcount_warn_saturate+0xa0/0x148
 __refcount_add.constprop.0+0x5c/0x80
 css_task_iter_advance_css_set+0xd8/0x210
 css_task_iter_advance+0xa8/0x120
 css_task_iter_next+0x94/0x158
 update_tasks_root_domain+0x58/0x98
 rebuild_root_domains+0xa0/0x1b0
 rebuild_sched_domains_locked+0x144/0x188
 cpuset_hotplug_workfn+0x138/0x5a0
 process_one_work+0x1e8/0x448
 worker_thread+0x228/0x3e0
 kthread+0xe0/0xf0
 ret_from_fork+0x10/0x20

then a kernel panic will be triggered as below:

Unable to handle kernel paging request at virtual address 00000000c0000010
Call trace:
 cgroup_apply_control_disable+0xa4/0x16c
 rebind_subsystems+0x224/0x590
 cgroup_destroy_root+0x64/0x2e0
 css_free_rwork_fn+0x198/0x2a0
 process_one_work+0x1d4/0x4bc
 worker_thread+0x158/0x410
 kthread+0x108/0x13c
 ret_from_fork+0x10/0x18

The race that cause this bug can be shown as below:

(hotplug cpu)                | (umount cpuset)
mutex_lock(&amp;cpuset_mutex)    | mutex_lock(&amp;cgroup_mutex)
cpuset_hotplug_workfn        |
 rebuild_root_domains        |  rebind_subsystems
  update_tasks_root_domain   |   spin_lock_irq(&amp;css_set_lock)
   css_task_iter_start       |    list_move_tail(&amp;cset-&gt;e_cset_node[ss-&gt;id]
   while(css_task_iter_next) |                  &amp;dcgrp-&gt;e_csets[ss-&gt;id]);
   css_task_iter_end         |   spin_unlock_irq(&amp;css_set_lock)
mutex_unlock(&amp;cpuset_mutex)  | mutex_unlock(&amp;cgroup_mutex)

Inside css_task_iter_start/next/end, css_set_lock is hold and then
released, so when iterating task(left side), the css_set may be moved to
another list(right side), then it-&gt;cset_head points to the old list head
and it-&gt;cset_pos-&gt;next points to the head node of new list, which can't
be used as struct css_set.

To fix this issue, switch from all css_sets to only scgrp's css_sets to
patch in-flight iterators to preserve correct iteration, and then
update it-&gt;cset_head as well.

Reported-by: Gaosheng Cui &lt;cuigaosheng1@huawei.com&gt;
Link: https://www.spinics.net/lists/cgroups/msg37935.html
Suggested-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.com/
Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Fixes: 2d8f243a5e6e ("cgroup: implement cgroup-&gt;e_csets[]")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix missing cpus_read_{lock,unlock}() in cgroup_transfer_tasks()</title>
<updated>2023-06-21T14:00:51Z</updated>
<author>
<name>Qi Zheng</name>
<email>zhengqi.arch@bytedance.com</email>
</author>
<published>2023-05-17T07:45:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=314e973f36a501268401079931f75cefa342cb0a'/>
<id>urn:sha1:314e973f36a501268401079931f75cefa342cb0a</id>
<content type='text'>
[ Upstream commit ab1de7ead871ebe6d12a774c3c25de0388cde082 ]

The commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem &lt;-&gt; cpus_read_lock()
deadlock") fixed the deadlock between cgroup_threadgroup_rwsem and
cpus_read_lock() by introducing cgroup_attach_{lock,unlock}() and removing
cpus_read_{lock,unlock}() from cpuset_attach(). But cgroup_transfer_tasks()
was missed and not handled, which will cause th following warning:

 WARNING: CPU: 0 PID: 589 at kernel/cpu.c:526 lockdep_assert_cpus_held+0x32/0x40
 CPU: 0 PID: 589 Comm: kworker/1:4 Not tainted 6.4.0-rc2-next-20230517 #50
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 Workqueue: events cpuset_hotplug_workfn
 RIP: 0010:lockdep_assert_cpus_held+0x32/0x40
 &lt;...&gt;
 Call Trace:
  &lt;TASK&gt;
  cpuset_attach+0x40/0x240
  cgroup_migrate_execute+0x452/0x5e0
  ? _raw_spin_unlock_irq+0x28/0x40
  cgroup_transfer_tasks+0x1f3/0x360
  ? find_held_lock+0x32/0x90
  ? cpuset_hotplug_workfn+0xc81/0xed0
  cpuset_hotplug_workfn+0xcb1/0xed0
  ? process_one_work+0x248/0x5b0
  process_one_work+0x2b9/0x5b0
  worker_thread+0x56/0x3b0
  ? process_one_work+0x5b0/0x5b0
  kthread+0xf1/0x120
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  &lt;/TASK&gt;

So just use the cgroup_attach_{lock,unlock}() helper to fix it.

Reported-by: Zhao Gongyi &lt;zhaogongyi@bytedance.com&gt;
Signed-off-by: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Acked-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: always put cset in cgroup_css_set_put_fork</title>
<updated>2023-06-21T14:00:51Z</updated>
<author>
<name>John Sperbeck</name>
<email>jsperbeck@google.com</email>
</author>
<published>2023-05-21T19:29:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c68b4db581386440dfb760b1e0d79ce10738047c'/>
<id>urn:sha1:c68b4db581386440dfb760b1e0d79ce10738047c</id>
<content type='text'>
[ Upstream commit 2bd110339288c18823dcace602b63b0d8627e520 ]

A successful call to cgroup_css_set_fork() will always have taken
a ref on kargs-&gt;cset (regardless of CLONE_INTO_CGROUP), so always
do a corresponding put in cgroup_css_set_put_fork().

Without this, a cset and its contained css structures will be
leaked for some fork failures.  The following script reproduces
the leak for a fork failure due to exceeding pids.max in the
pids controller.  A similar thing can happen if we jump to the
bad_fork_cancel_cgroup label in copy_process().

[ -z "$1" ] &amp;&amp; echo "Usage $0 pids-root" &amp;&amp; exit 1
PID_ROOT=$1
CGROUP=$PID_ROOT/foo

[ -e $CGROUP ] &amp;&amp; rmdir -f $CGROUP
mkdir $CGROUP
echo 5 &gt; $CGROUP/pids.max
echo $$ &gt; $CGROUP/cgroup.procs

fork_bomb()
{
	set -e
	for i in $(seq 10); do
		/bin/sleep 3600 &amp;
	done
}

(fork_bomb) &amp;
wait
echo $$ &gt; $PID_ROOT/cgroup.procs
kill $(cat $CGROUP/cgroup.procs)
rmdir $CGROUP

Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: John Sperbeck &lt;jsperbeck@google.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers</title>
<updated>2023-06-21T14:00:51Z</updated>
<author>
<name>Kamalesh Babulal</name>
<email>kamalesh.babulal@oracle.com</email>
</author>
<published>2023-03-03T09:53:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a2e2ca9ad21c770398c1e6b6d63c93e40bc0cfe'/>
<id>urn:sha1:7a2e2ca9ad21c770398c1e6b6d63c93e40bc0cfe</id>
<content type='text'>
[ Upstream commit 4cdb91b0dea7d7f59fa84a13c7753cd434fdedcf ]

Replace mutex_[un]lock() with cgroup_[un]lock() wrappers to stay
consistent across cgroup core and other subsystem code, while
operating on the cgroup_mutex.

Signed-off-by: Kamalesh Babulal &lt;kamalesh.babulal@oracle.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Stable-dep-of: 2bd110339288 ("cgroup: always put cset in cgroup_css_set_put_fork")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods</title>
<updated>2023-04-20T10:35:14Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-04-11T13:35:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=243d9f3a1101dfe48805e1f7c7c2661618b299dd'/>
<id>urn:sha1:243d9f3a1101dfe48805e1f7c7c2661618b299dd</id>
<content type='text'>
[ Upstream commit eee87853794187f6adbe19533ed79c8b44b36a91 ]

In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.

We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.

Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly</title>
<updated>2023-04-20T10:35:14Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-04-11T13:35:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e33ab28395d3f509460b808b1fd9a7840deb4898'/>
<id>urn:sha1:e33ab28395d3f509460b808b1fd9a7840deb4898</id>
<content type='text'>
[ Upstream commit 42a11bf5c5436e91b040aeb04063be1710bb9f9c ]

By default, the clone(2) syscall spawn a child process into the same
cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag
introduced by commit ef2c41cf38a7 ("clone3: allow spawning processes
into cgroups"), the child will be spawned into a different cgroup which
is somewhat similar to writing the child's tid into "cgroup.threads".

The current cpuset_fork() method does not properly handle the
CLONE_INTO_CGROUP case where the cpuset of the child may be different
from that of its parent.  Update the cpuset_fork() method to treat the
CLONE_INTO_CGROUP case similar to cpuset_attach().

Since the newly cloned task has not been running yet, its actual
memory usage isn't known. So it is not necessary to make change to mm
in cpuset_fork().

Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Reported-by: Giuseppe Scrivano &lt;gscrivan@redhat.com&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Skip spread flags update on v2</title>
<updated>2023-04-20T10:35:14Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-11-12T22:19:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ff5a4fe2598e83dc113baba9a12addbc077e030b'/>
<id>urn:sha1:ff5a4fe2598e83dc113baba9a12addbc077e030b</id>
<content type='text'>
[ Upstream commit 18f9a4d47527772515ad6cbdac796422566e6440 ]

Cpuset v2 has no spread flags to set. So we can skip spread
flags update if cpuset v2 is being used. Also change the name to
cpuset_update_task_spread_flags() to indicate that there are multiple
spread flags.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Stable-dep-of: 42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
