<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v6.9.11</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.11</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.11'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-07-25T07:53:31Z</updated>
<entry>
<title>workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string</title>
<updated>2024-07-25T07:53:31Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-05-20T23:28:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3d73b877aa5a6f07875a97b885bedfb6448be97'/>
<id>urn:sha1:f3d73b877aa5a6f07875a97b885bedfb6448be97</id>
<content type='text'>
[ Upstream commit 2a1b02bcba78f8498ab00d6142e1238d85b01591 ]

Currently, worker ID formatting is open coded in create_worker(),
init_rescuer() and worker_thread() (for %WORKER_DIE case). The formatted ID
is saved into task-&gt;comm and wq_worker_comm() uses it as the base name to
append extra information to when generating the name to be shown to
userspace.

However, TASK_COMM_LEN is only 16 leading to badly truncated names for
rescuers. For example, the rescuer for the inet_frag_wq workqueue becomes:

  $ ps -ef | grep '[k]worker/R-inet'
  root         483       2  0 Apr26 ?        00:00:00 [kworker/R-inet_]

Even for non-rescue workers, it's easy to run over 15 characters on
moderately large machines.

Fit it by consolidating worker ID formatting into a new helper
format_worker_id() and calling it from wq_worker_comm() to obtain the
untruncated worker ID string.

  $ ps -ef | grep '[k]worker/R-inet'
  root          60       2  0 12:10 ?        00:00:00 [kworker/R-inet_frag_wq]

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Jan Engelhardt &lt;jengelh@inai.de&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix task_struct reference leak</title>
<updated>2024-07-18T11:22:55Z</updated>
<author>
<name>Wander Lairson Costa</name>
<email>wander@redhat.com</email>
</author>
<published>2024-06-20T12:56:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a54d31face626f62de415ebe77b43f76c3ffaf4'/>
<id>urn:sha1:7a54d31face626f62de415ebe77b43f76c3ffaf4</id>
<content type='text'>
[ Upstream commit b58652db66c910c2245f5bee7deca41c12d707b9 ]

During the execution of the following stress test with linux-rt:

stress-ng --cyclic 30 --timeout 30 --minimize --quiet

kmemleak frequently reported a memory leak concerning the task_struct:

unreferenced object 0xffff8881305b8000 (size 16136):
  comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
  object hex dump (first 32 bytes):
    02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .@..............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  debug hex dump (first 16 bytes):
    53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  S...............
  backtrace:
    [&lt;00000000046b6790&gt;] dup_task_struct+0x30/0x540
    [&lt;00000000c5ca0f0b&gt;] copy_process+0x3d9/0x50e0
    [&lt;00000000ced59777&gt;] kernel_clone+0xb0/0x770
    [&lt;00000000a50befdc&gt;] __do_sys_clone+0xb6/0xf0
    [&lt;000000001dbf2008&gt;] do_syscall_64+0x5d/0xf0
    [&lt;00000000552900ff&gt;] entry_SYSCALL_64_after_hwframe+0x6e/0x76

The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.

This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled.

Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing")
Signed-off-by: Wander Lairson Costa &lt;wander@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Link: https://lore.kernel.org/r/20240620125618.11419-1-wander@redhat.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "sched/fair: Make sure to try to detach at least one movable task"</title>
<updated>2024-07-18T11:22:44Z</updated>
<author>
<name>Josh Don</name>
<email>joshdon@google.com</email>
</author>
<published>2024-06-20T21:44:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0fa6dcbfa2e2b97c1e6febbea561badf0931a38b'/>
<id>urn:sha1:0fa6dcbfa2e2b97c1e6febbea561badf0931a38b</id>
<content type='text'>
commit 2feab2492deb2f14f9675dd6388e9e2bf669c27a upstream.

This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.

b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
all tasks examined to that point were pinned. The goal of the patch was
to make it more likely to be able to detach a task buried in a long list
of pinned tasks. However, this has the unfortunate side effect of
creating an O(n) iteration in detach_tasks(), as we now must fully
iterate every task on a cpu if all or most are pinned. Since this load
balance code is done with rq lock held, and often in softirq context, it
is very easy to trigger hard lockups. We observed such hard lockups with
a user who affined O(10k) threads to a single cpu.

When I discussed this with Vincent he initially suggested that we keep
the limit on the number of tasks to detach, but increase the number of
tasks we can search. However, after some back and forth on the mailing
list, he recommended we instead revert the original patch, as it seems
likely no one was actually getting hit by the original issue.

Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least one movable task")
Signed-off-by: Josh Don &lt;joshdon@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20240620214450.316280-1-joshdon@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Defer work in bpf_timer_cancel_and_free</title>
<updated>2024-07-18T11:22:40Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2024-07-09T18:54:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7aa5a19279c3639ae8b758b63f05d0c616a39fa1'/>
<id>urn:sha1:7aa5a19279c3639ae8b758b63f05d0c616a39fa1</id>
<content type='text'>
[ Upstream commit a6fcd19d7eac1335eb76bc16b6a66b7f574d1d69 ]

Currently, the same case as previous patch (two timer callbacks trying
to cancel each other) can be invoked through bpf_map_update_elem as
well, or more precisely, freeing map elements containing timers. Since
this relies on hrtimer_cancel as well, it is prone to the same deadlock
situation as the previous patch.

It would be sufficient to use hrtimer_try_to_cancel to fix this problem,
as the timer cannot be enqueued after async_cancel_and_free. Once
async_cancel_and_free has been done, the timer must be reinitialized
before it can be armed again. The callback running in parallel trying to
arm the timer will fail, and freeing bpf_hrtimer without waiting is
sufficient (given kfree_rcu), and bpf_timer_cb will return
HRTIMER_NORESTART, preventing the timer from being rearmed again.

However, there exists a UAF scenario where the callback arms the timer
before entering this function, such that if cancellation fails (due to
timer callback invoking this routine, or the target timer callback
running concurrently). In such a case, if the timer expiration is
significantly far in the future, the RCU grace period expiration
happening before it will free the bpf_hrtimer state and along with it
the struct hrtimer, that is enqueued.

Hence, it is clear cancellation needs to occur after
async_cancel_and_free, and yet it cannot be done inline due to deadlock
issues. We thus modify bpf_timer_cancel_and_free to defer work to the
global workqueue, adding a work_struct alongside rcu_head (both used at
_different_ points of time, so can share space).

Update existing code comments to reflect the new state of affairs.

Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fail bpf_timer_cancel when callback is being cancelled</title>
<updated>2024-07-18T11:22:40Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2024-07-09T18:54:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e4e8178a8666c56813bd167b848fca0f4c9af0a'/>
<id>urn:sha1:3e4e8178a8666c56813bd167b848fca0f4c9af0a</id>
<content type='text'>
[ Upstream commit d4523831f07a267a943f0dde844bf8ead7495f13 ]

Given a schedule:

timer1 cb			timer2 cb

bpf_timer_cancel(timer2);	bpf_timer_cancel(timer1);

Both bpf_timer_cancel calls would wait for the other callback to finish
executing, introducing a lockup.

Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
track of all in-flight cancellation requests for a given BPF timer.
Whenever cancelling a BPF timer, we must check if we have outstanding
cancellation requests, and if so, we must fail the operation with an
error (-EDEADLK) since cancellation is synchronous and waits for the
callback to finish executing. This implies that we can enter a deadlock
situation involving two or more timer callbacks executing in parallel
and attempting to cancel one another.

Note that we avoid incrementing the cancelling counter for the target
timer (the one being cancelled) if bpf_timer_cancel is not invoked from
a callback, to avoid spurious errors. The whole point of detecting
cur-&gt;cancelling and returning -EDEADLK is to not enter a busy wait loop
(which may or may not lead to a lockup). This does not apply in case the
caller is in a non-callback context, the other side can continue to
cancel as it sees fit without running into errors.

Background on prior attempts:

Earlier versions of this patch used a bool 'cancelling' bit and used the
following pattern under timer-&gt;lock to publish cancellation status.

lock(t-&gt;lock);
t-&gt;cancelling = true;
mb();
if (cur-&gt;cancelling)
	return -EDEADLK;
unlock(t-&gt;lock);
hrtimer_cancel(t-&gt;timer);
t-&gt;cancelling = false;

The store outside the critical section could overwrite a parallel
requests t-&gt;cancelling assignment to true, to ensure the parallely
executing callback observes its cancellation status.

It would be necessary to clear this cancelling bit once hrtimer_cancel
is done, but lack of serialization introduced races. Another option was
explored where bpf_timer_start would clear the bit when (re)starting the
timer under timer-&gt;lock. This would ensure serialized access to the
cancelling bit, but may allow it to be cleared before in-flight
hrtimer_cancel has finished executing, such that lockups can occur
again.

Thus, we choose an atomic counter to keep track of all outstanding
cancellation requests and use it to prevent lockups in case callbacks
attempt to cancel each other while executing in parallel.

Reported-by: Dohyun Kim &lt;dohyunkim@google.com&gt;
Reported-by: Neel Natu &lt;neelnatu@google.com&gt;
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: replace bpf_timer_init with a generic helper</title>
<updated>2024-07-18T11:22:39Z</updated>
<author>
<name>Benjamin Tissoires</name>
<email>bentiss@kernel.org</email>
</author>
<published>2024-04-20T09:09:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=107d34c9a5d9f4221437571f40c9cbca80155b14'/>
<id>urn:sha1:107d34c9a5d9f4221437571f40c9cbca80155b14</id>
<content type='text'>
[ Upstream commit 56b4a177ae6322173360a93ea828ad18570a5a14 ]

No code change except for the new flags argument being stored in the
local data struct.

Signed-off-by: Benjamin Tissoires &lt;bentiss@kernel.org&gt;
Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-2-6c986a5a741f@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Stable-dep-of: d4523831f07a ("bpf: Fail bpf_timer_cancel when callback is being cancelled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: make timer data struct more generic</title>
<updated>2024-07-18T11:22:39Z</updated>
<author>
<name>Benjamin Tissoires</name>
<email>bentiss@kernel.org</email>
</author>
<published>2024-04-20T09:09:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c8ae85e1f15af389843d1a3b6b2cc6a211884fe7'/>
<id>urn:sha1:c8ae85e1f15af389843d1a3b6b2cc6a211884fe7</id>
<content type='text'>
[ Upstream commit be2749beff62e0d63cf97fe63cabc79a68443139 ]

To be able to add workqueues and reuse most of the timer code, we need
to make bpf_hrtimer more generic.

There is no code change except that the new struct gets a new u64 flags
attribute. We are still below 2 cache lines, so this shouldn't impact
the current running codes.

The ordering is also changed. Everything related to async callback
is now on top of bpf_hrtimer.

Signed-off-by: Benjamin Tissoires &lt;bentiss@kernel.org&gt;
Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-1-6c986a5a741f@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Stable-dep-of: d4523831f07a ("bpf: Fail bpf_timer_cancel when callback is being cancelled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix order of args in call to bpf_map_kvcalloc</title>
<updated>2024-07-18T11:22:39Z</updated>
<author>
<name>Mohammad Shehar Yaar Tausif</name>
<email>sheharyaar48@gmail.com</email>
</author>
<published>2024-07-10T10:05:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1ca3bfe309393eeb3845541d65b8b1c4347c20f3'/>
<id>urn:sha1:1ca3bfe309393eeb3845541d65b8b1c4347c20f3</id>
<content type='text'>
[ Upstream commit af253aef183a31ce62d2e39fc520b0ebfb562bb9 ]

The original function call passed size of smap-&gt;bucket before the number of
buckets which raises the error 'calloc-transposed-args' on compilation.

Vlastimil Babka added:

The order of parameters can be traced back all the way to 6ac99e8f23d4
("bpf: Introduce bpf sk local storage") accross several refactorings,
and that's why the commit is used as a Fixes: tag.

In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined
allocation helpers to account at the call site") however exposed the
order of args in a way that gcc-14 has enough visibility to start
warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is
then a macro alias for kvcalloc instead of a static inline wrapper.

To sum up the warning happens when the following conditions are all met:

- gcc-14 is used (didn't see it with gcc-13)
- commit 2c321f3f70bc is present
- CONFIG_MEMCG is not enabled in .config
- CONFIG_WERROR turns this from a compiler warning to error

Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reviewed-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Tested-by: Christian Kujau &lt;lists@nerdbynature.de&gt;
Signed-off-by: Mohammad Shehar Yaar Tausif &lt;sheharyaar48@gmail.com&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>swap: yield device immediately</title>
<updated>2024-07-11T10:51:22Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-05-21T19:00:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d143321f8392f9598659ef2ac2dee592a03aae65'/>
<id>urn:sha1:d143321f8392f9598659ef2ac2dee592a03aae65</id>
<content type='text'>
[ Upstream commit 712182b67e831912f90259102ae334089e7bccd1 ]

Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina &lt;petri.kaukasoina@tuni.fi&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails</title>
<updated>2024-07-11T10:51:21Z</updated>
<author>
<name>Fedor Pchelkin</name>
<email>pchelkin@ispras.ru</email>
</author>
<published>2024-05-04T11:47:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8108b176ea37fa86c1c2b25335eec8b9177a212a'/>
<id>urn:sha1:8108b176ea37fa86c1c2b25335eec8b9177a212a</id>
<content type='text'>
[ Upstream commit f7c9ccaadffd13066353332c13d7e9bf73b8f92d ]

If do_map_benchmark() has failed, there is nothing useful to copy back
to userspace.

Suggested-by: Barry Song &lt;21cnbao@gmail.com&gt;
Signed-off-by: Fedor Pchelkin &lt;pchelkin@ispras.ru&gt;
Acked-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
