<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cgroup, branch v6.1.124</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.124</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.124'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-12-14T18:53:21Z</updated>
<entry>
<title>cgroup/bpf: only cgroup v2 can be attached by bpf programs</title>
<updated>2024-12-14T18:53:21Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2024-10-18T08:15:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c222df370ed49dd8a7183de3a1a3af92e61d87b'/>
<id>urn:sha1:8c222df370ed49dd8a7183de3a1a3af92e61d87b</id>
<content type='text'>
[ Upstream commit 2190df6c91373fdec6db9fc07e427084f232f57e ]

Only cgroup v2 can be attached by bpf programs, so this patch introduces
that cgroup_bpf_inherit and cgroup_bpf_offline can only be called in
cgroup v2, and this can fix the memleak mentioned by commit 04f8ef5643bc
("cgroup: Fix memory leak caused by missing cgroup_bpf_offline"), which
has been reverted.

Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Link: https://lore.kernel.org/cgroups/aka2hk5jsel5zomucpwlxsej6iwnfw4qu5jkrmjhyfhesjlfdw@46zxhg5bdnr7/
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline"</title>
<updated>2024-12-14T18:53:21Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2024-10-18T08:15:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4e5e64d28071d207d561bb5197e39a2118782bf'/>
<id>urn:sha1:c4e5e64d28071d207d561bb5197e39a2118782bf</id>
<content type='text'>
[ Upstream commit feb301c60970bd2a1310a53ce2d6e4375397a51b ]

This reverts commit 04f8ef5643bcd8bcde25dfdebef998aea480b2ba.

Only cgroup v2 can be attached by cgroup by BPF programs. Revert this
commit and cgroup_bpf_inherit and cgroup_bpf_offline won't be called in
cgroup v1. The memory leak issue will be fixed with next patch.

Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
Link: https://lore.kernel.org/cgroups/aka2hk5jsel5zomucpwlxsej6iwnfw4qu5jkrmjhyfhesjlfdw@46zxhg5bdnr7/
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix potential overflow issue when checking max_depth</title>
<updated>2024-11-08T15:26:40Z</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-10-12T07:22:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c22c7488340e260667498d6f6ff340d2a6183985'/>
<id>urn:sha1:c22c7488340e260667498d6f6ff340d2a6183985</id>
<content type='text'>
[ Upstream commit 3cc4e13bb1617f6a13e5e6882465984148743cf4 ]

cgroup.max.depth is the maximum allowed descent depth below the current
cgroup. If the actual descent depth is equal or larger, an attempt to
create a new child cgroup will fail. However due to the cgroup-&gt;max_depth
is of int type and having the default value INT_MAX, the condition
'level &gt; cgroup-&gt;max_depth' will never be satisfied, and it will cause
an overflow of the level after it reaches to INT_MAX.

Fix it by starting the level from 0 and using '&gt;=' instead.

It's worth mentioning that this issue is unlikely to occur in reality,
as it's impossible to have a depth of INT_MAX hierarchy, but should be
be avoided logically.

Fixes: 1a926e0bbab8 ("cgroup: implement hierarchy limits")
Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Protect css-&gt;cgroup write under css_set_lock</title>
<updated>2024-09-12T09:10:22Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-07-03T18:52:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3cff144cb31b8a7c409474352ef169d6a1a99909'/>
<id>urn:sha1:3cff144cb31b8a7c409474352ef169d6a1a99909</id>
<content type='text'>
[ Upstream commit 57b56d16800e8961278ecff0dc755d46c4575092 ]

The writing of css-&gt;cgroup associated with the cgroup root in
rebind_subsystems() is currently protected only by cgroup_mutex.
However, the reading of css-&gt;cgroup in both proc_cpuset_show() and
proc_cgroup_show() is protected just by css_set_lock. That makes the
readers susceptible to racing problems like data tearing or caching.
It is also a problem that can be reported by KCSAN.

This can be fixed by using READ_ONCE() and WRITE_ONCE() to access
css-&gt;cgroup. Alternatively, the writing of css-&gt;cgroup can be moved
under css_set_lock as well which is done by this patch.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Avoid extra dereference in css_populate_dir()</title>
<updated>2024-08-29T15:30:28Z</updated>
<author>
<name>Kamalesh Babulal</name>
<email>kamalesh.babulal@oracle.com</email>
</author>
<published>2023-09-12T07:04:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f3c6b50252edc3eb8c95c77468afea22c24fbc0'/>
<id>urn:sha1:7f3c6b50252edc3eb8c95c77468afea22c24fbc0</id>
<content type='text'>
[ Upstream commit d24f05987ce8bf61e62d86fedbe47523dc5c3393 ]

Use css directly instead of dereferencing it from &amp;cgroup-&gt;self, while
adding the cgroup v2 cft base and psi files in css_populate_dir(). Both
points to the same css, when css-&gt;ss is NULL, this avoids extra deferences
and makes code consistent in usage across the function.

Signed-off-by: Kamalesh Babulal &lt;kamalesh.babulal@oracle.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Make operations on the cgroup root_list RCU safe</title>
<updated>2024-08-19T04:00:06Z</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2023-10-29T06:14:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5b7a9792041f16c8e32a34e58be2f8d0d612aa1'/>
<id>urn:sha1:f5b7a9792041f16c8e32a34e58be2f8d0d612aa1</id>
<content type='text'>
commit d23b5c577715892c87533b13923306acc6243f93 upstream.

At present, when we perform operations on the cgroup root_list, we must
hold the cgroup_mutex, which is a relatively heavyweight lock. In reality,
we can make operations on this list RCU-safe, eliminating the need to hold
the cgroup_mutex during traversal. Modifications to the list only occur in
the cgroup root setup and destroy paths, which should be infrequent in a
production environment. In contrast, traversal may occur frequently.
Therefore, making it RCU-safe would be beneficial.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Prevent UAF in proc_cpuset_show()</title>
<updated>2024-08-03T06:48:54Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2024-06-28T01:36:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=29a8d4e02fd4840028c38ceb1536cc8f82a257d4'/>
<id>urn:sha1:29a8d4e02fd4840028c38ceb1536cc8f82a257d4</id>
<content type='text'>
[ Upstream commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a ]

An UAF can happen when /proc/cpuset is read as reported in [1].

This can be reproduced by the following methods:
1.add an mdelay(1000) before acquiring the cgroup_lock In the
 cgroup_path_ns function.
2.$cat /proc/&lt;pid&gt;/cpuset   repeatly.
3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/
$umount /sys/fs/cgroup/cpuset/   repeatly.

The race that cause this bug can be shown as below:

(umount)		|	(cat /proc/&lt;pid&gt;/cpuset)
css_release		|	proc_cpuset_show
css_release_work_fn	|	css = task_get_css(tsk, cpuset_cgrp_id);
css_free_rwork_fn	|	cgroup_path_ns(css-&gt;cgroup, ...);
cgroup_destroy_root	|	mutex_lock(&amp;cgroup_mutex);
rebind_subsystems	|
cgroup_free_root 	|
			|	// cgrp was freed, UAF
			|	cgroup_path_ns_locked(cgrp,..);

When the cpuset is initialized, the root node top_cpuset.css.cgrp
will point to &amp;cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will
allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated
&amp;cgroup_root.cgrp. When the umount operation is executed,
top_cpuset.css.cgrp will be rebound to &amp;cgrp_dfl_root.cgrp.

The problem is that when rebinding to cgrp_dfl_root, there are cases
where the cgroup_root allocated by setting up the root for cgroup v1
is cached. This could lead to a Use-After-Free (UAF) if it is
subsequently freed. The descendant cgroups of cgroup v1 can only be
freed after the css is released. However, the css of the root will never
be released, yet the cgroup_root should be freed when it is unmounted.
This means that obtaining a reference to the css of the root does
not guarantee that css.cgrp-&gt;root will not be freed.

Fix this problem by using rcu_read_lock in proc_cpuset_show().
As cgroup_root is kfree_rcu after commit d23b5c577715
("cgroup: Make operations on the cgroup root_list RCU safe"),
css-&gt;cgroup won't be freed during the critical section.
To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to
replace task_get_css with task_css.

[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd

Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()</title>
<updated>2024-08-03T06:48:54Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2023-12-12T21:17:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fa203531aa397fd1b1a74f111ba5361badc979b7'/>
<id>urn:sha1:fa203531aa397fd1b1a74f111ba5361badc979b7</id>
<content type='text'>
[ Upstream commit ff6d413b0b59466e5acf2e42f294b1842ae130a1 ]

One of the last remaining users of strlcpy() in the kernel is
kernfs_path_from_node_locked(), which passes back the problematic "length
we _would_ have copied" return value to indicate truncation.  Convert the
chain of all callers to use the negative return value (some of which
already doing this explicitly). All callers were already also checking
for negative return values, so the risk to missed checks looks very low.

In this analysis, it was found that cgroup1_release_agent() actually
didn't handle the "too large" condition, so this is technically also a
bug fix. :)

Here's the chain of callers, and resolution identifying each one as now
handling the correct return value:

kernfs_path_from_node_locked()
        kernfs_path_from_node()
                pr_cont_kernfs_path()
                        returns void
                kernfs_path()
                        sysfs_warn_dup()
                                return value ignored
                        cgroup_path()
                                blkg_path()
                                        bfq_bic_update_cgroup()
                                                return value ignored
                                TRACE_IOCG_PATH()
                                        return value ignored
                                TRACE_CGROUP_PATH()
                                        return value ignored
                                perf_event_cgroup()
                                        return value ignored
                                task_group_path()
                                        return value ignored
                                damon_sysfs_memcg_path_eq()
                                        return value ignored
                                get_mm_memcg_path()
                                        return value ignored
                                lru_gen_seq_show()
                                        return value ignored
                        cgroup_path_from_kernfs_id()
                                return value ignored
                cgroup_show_path()
                        already converted "too large" error to negative value
                cgroup_path_ns_locked()
                        cgroup_path_ns()
                                bpf_iter_cgroup_show_fdinfo()
                                        return value ignored
                                cgroup1_release_agent()
                                        wasn't checking "too large" error
                        proc_cgroup_show()
                                already converted "too large" to negative value

Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zefan Li &lt;lizefan.x@bytedance.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: &lt;cgroups@vger.kernel.org&gt;
Co-developed-by: Azeem Shaikh &lt;azeemshaikh38@gmail.com&gt;
Signed-off-by: Azeem Shaikh &lt;azeemshaikh38@gmail.com&gt;
Link: https://lore.kernel.org/r/20231116192127.1558276-3-keescook@chromium.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20231212211741.164376-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Stable-dep-of: 1be59c97c83c ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level</title>
<updated>2024-06-12T09:03:33Z</updated>
<author>
<name>Vitalii Bursov</name>
<email>vitaly@bursov.com</email>
</author>
<published>2024-04-30T15:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e72a3ebf809847688df54b1a05c36fb27cd7d061'/>
<id>urn:sha1:e72a3ebf809847688df54b1a05c36fb27cd7d061</id>
<content type='text'>
[ Upstream commit a1fd0b9d751f840df23ef0e75b691fc00cfd4743 ]

Change relax_domain_level checks so that it would be possible
to include or exclude all domains from newidle balancing.

This matches the behavior described in the documentation:

  -1   no request. use system default or follow request of others.
   0   no search.
   1   search siblings (hyperthreads in a core).

"2" enables levels 0 and 1, level_max excludes the last (level_max)
level, and level_max+1 includes all levels.

Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core")
Signed-off-by: Vitalii Bursov &lt;vitaly@bursov.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup_freezer: cgroup_freezing: Check if not frozen</title>
<updated>2023-12-13T17:39:19Z</updated>
<author>
<name>Tim Van Patten</name>
<email>timvp@google.com</email>
</author>
<published>2023-11-15T16:20:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c00f01832f27eb90e3e5a803a91ac28049d281f'/>
<id>urn:sha1:4c00f01832f27eb90e3e5a803a91ac28049d281f</id>
<content type='text'>
commit cff5f49d433fcd0063c8be7dd08fa5bf190c6c37 upstream.

__thaw_task() was recently updated to warn if the task being thawed was
part of a freezer cgroup that is still currently freezing:

	void __thaw_task(struct task_struct *p)
	{
	...
		if (WARN_ON_ONCE(freezing(p)))
			goto unlock;

This has exposed a bug in cgroup1 freezing where when CGROUP_FROZEN is
asserted, the CGROUP_FREEZING bits are not also cleared at the same
time. Meaning, when a cgroup is marked FROZEN it continues to be marked
FREEZING as well. This causes the WARNING to trigger, because
cgroup_freezing() thinks the cgroup is still freezing.

There are two ways to fix this:

1. Whenever FROZEN is set, clear FREEZING for the cgroup and all
children cgroups.
2. Update cgroup_freezing() to also verify that FROZEN is not set.

This patch implements option (2), since it's smaller and more
straightforward.

Signed-off-by: Tim Van Patten &lt;timvp@google.com&gt;
Tested-by: Mark Hasemeyer &lt;markhas@chromium.org&gt;
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
