<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v4.19.324</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.324</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.324'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-11-17T13:58:08Z</updated>
<entry>
<title>bpf: use kvzmalloc to allocate BPF verifier environment</title>
<updated>2024-11-17T13:58:08Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@surriel.com</email>
</author>
<published>2024-10-08T21:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=44d3e3d8644bf3534a376d6f1073a764c73b4781'/>
<id>urn:sha1:44d3e3d8644bf3534a376d6f1073a764c73b4781</id>
<content type='text'>
[ Upstream commit 434247637c66e1be2bc71a9987d4c3f0d8672387 ]

The kzmalloc call in bpf_check can fail when memory is very fragmented,
which in turn can lead to an OOM kill.

Use kvzmalloc to fall back to vmalloc when memory is too fragmented to
allocate an order 3 sized bpf verifier environment.

Admittedly this is not a very common case, and only happens on systems
where memory has already been squeezed close to the limit, but this does
not seem like much of a hot path, and it's a simple enough fix.

Signed-off-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Link: https://lore.kernel.org/r/20241008170735.16766766@imladris.surriel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix out-of-bounds write in trie_get_next_key()</title>
<updated>2024-11-08T15:19:22Z</updated>
<author>
<name>Byeonguk Jeong</name>
<email>jungbu2855@gmail.com</email>
</author>
<published>2024-10-26T05:02:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8494ac079814a53fbc2258d2743e720907488ed'/>
<id>urn:sha1:e8494ac079814a53fbc2258d2743e720907488ed</id>
<content type='text'>
[ Upstream commit 13400ac8fb80c57c2bfb12ebd35ee121ce9b4d21 ]

trie_get_next_key() allocates a node stack with size trie-&gt;max_prefixlen,
while it writes (trie-&gt;max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Signed-off-by: Byeonguk Jeong &lt;jungbu2855@gmail.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@kernel.org&gt;
Tested-by: Hou Tao &lt;houtao1@huawei.com&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/Zxx384ZfdlFYnz6J@localhost.localdomain
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix potential overflow issue when checking max_depth</title>
<updated>2024-11-08T15:19:22Z</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-10-12T07:22:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=339df130db47ae7e89fddce5729b0f0566405d1d'/>
<id>urn:sha1:339df130db47ae7e89fddce5729b0f0566405d1d</id>
<content type='text'>
[ Upstream commit 3cc4e13bb1617f6a13e5e6882465984148743cf4 ]

cgroup.max.depth is the maximum allowed descent depth below the current
cgroup. If the actual descent depth is equal or larger, an attempt to
create a new child cgroup will fail. However due to the cgroup-&gt;max_depth
is of int type and having the default value INT_MAX, the condition
'level &gt; cgroup-&gt;max_depth' will never be satisfied, and it will cause
an overflow of the level after it reaches to INT_MAX.

Fix it by starting the level from 0 and using '&gt;=' instead.

It's worth mentioning that this issue is unlikely to occur in reality,
as it's impossible to have a depth of INT_MAX hierarchy, but should be
be avoided logically.

Fixes: 1a926e0bbab8 ("cgroup: implement hierarchy limits")
Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()</title>
<updated>2024-11-08T15:19:21Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2024-10-18T10:07:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d005400262ddaf1ca1666bbcd1acf42fe81d57ce'/>
<id>urn:sha1:d005400262ddaf1ca1666bbcd1acf42fe81d57ce</id>
<content type='text'>
[ Upstream commit 6e62807c7fbb3c758d233018caf94dfea9c65dbd ]

If get_clock_desc() succeeds, it calls fget() for the clockid's fd,
and get the clk-&gt;rwsem read lock, so the error path should release
the lock to make the lock balance and fput the clockid's fd to make
the refcount balance and release the fd related resource.

However the below commit left the error path locked behind resulting in
unbalanced locking. Check timespec64_valid_strict() before
get_clock_desc() to fix it, because the "ts" is not changed
after that.

Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()")
Acked-by: Richard Cochran &lt;richardcochran@gmail.com&gt;
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Acked-by: Anna-Maria Behnsen &lt;anna-maria@linutronix.de&gt;
[pabeni@redhat.com: fixed commit message typo]
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>posix-clock: Fix missing timespec64 check in pc_clock_settime()</title>
<updated>2024-11-08T15:19:18Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2024-10-09T07:23:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=29f085345cde24566efb751f39e5d367c381c584'/>
<id>urn:sha1:29f085345cde24566efb751f39e5d367c381c584</id>
<content type='text'>
commit d8794ac20a299b647ba9958f6d657051fc51a540 upstream.

As Andrew pointed out, it will make sense that the PTP core
checked timespec64 struct's tv_sec and tv_nsec range before calling
ptp-&gt;info-&gt;settime64().

As the man manual of clock_settime() said, if tp.tv_sec is negative or
tp.tv_nsec is outside the range [0..999,999,999], it should return EINVAL,
which include dynamic clocks which handles PTP clock, and the condition is
consistent with timespec64_valid(). As Thomas suggested, timespec64_valid()
only check the timespec is valid, but not ensure that the time is
in a valid range, so check it ahead using timespec64_valid_strict()
in pc_clock_settime() and return -EINVAL if not valid.

There are some drivers that use tp-&gt;tv_sec and tp-&gt;tv_nsec directly to
write registers without validity checks and assume that the higher layer
has checked it, which is dangerous and will benefit from this, such as
hclge_ptp_settime(), igb_ptp_settime_i210(), _rcar_gen4_ptp_settime(),
and some drivers can remove the checks of itself.

Cc: stable@vger.kernel.org
Fixes: 0606f422b453 ("posix clocks: Introduce dynamic clocks")
Acked-by: Richard Cochran &lt;richardcochran@gmail.com&gt;
Suggested-by: Andrew Lunn &lt;andrew@lunn.ch&gt;
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Link: https://patch.msgid.link/20241009072302.1754567-2-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Check percpu map value size first</title>
<updated>2024-11-08T15:19:16Z</updated>
<author>
<name>Tao Chen</name>
<email>chen.dylane@gmail.com</email>
</author>
<published>2024-09-10T14:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b5ac877855a603c9434e37f04ec419af36cc465b'/>
<id>urn:sha1:b5ac877855a603c9434e37f04ec419af36cc465b</id>
<content type='text'>
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ]

Percpu map is often used, but the map value size limit often ignored,
like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
like percpu map of local_storage. Maybe the error message seems clearer
compared with "cannot allocate memory".

Signed-off-by: Jinke Han &lt;jinkehan@didiglobal.com&gt;
Signed-off-by: Tao Chen &lt;chen.dylane@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tracing: Remove precision vsnprintf() check from print event</title>
<updated>2024-11-08T15:19:16Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-03-04T22:43:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3de4b5d1ab8139aee39cc8afbd86a2cf260ad91'/>
<id>urn:sha1:f3de4b5d1ab8139aee39cc8afbd86a2cf260ad91</id>
<content type='text'>
[ Upstream commit 5efd3e2aef91d2d812290dcb25b2058e6f3f532c ]

This reverts 60be76eeabb3d ("tracing: Add size check when printing
trace_marker output"). The only reason the precision check was added
was because of a bug that miscalculated the write size of the string into
the ring buffer and it truncated it removing the terminating nul byte. On
reading the trace it crashed the kernel. But this was due to the bug in
the code that happened during development and should never happen in
practice. If anything, the precision can hide bugs where the string in the
ring buffer isn't nul terminated and it will not be checked.

Link: https://lore.kernel.org/all/C7E7AF1A-D30F-4D18-B8E5-AF1EF58004F5@linux.ibm.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240227125706.04279ac2@gandalf.local.home
Link: https://lore.kernel.org/all/20240302111244.3a1674be@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240304174341.2a561d9f@gandalf.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Fixes: 60be76eeabb3d ("tracing: Add size check when printing trace_marker output")
Reported-by: Sachin Sant &lt;sachinp@linux.ibm.com&gt;
Tested-by: Sachin Sant &lt;sachinp@linux.ibm.com&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>uprobes: fix kernel info leak via "[uprobes]" vma</title>
<updated>2024-11-08T15:19:15Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2024-10-07T17:46:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f31f92107e5a8ecc8902705122c594e979a351fe'/>
<id>urn:sha1:f31f92107e5a8ecc8902705122c594e979a351fe</id>
<content type='text'>
commit 34820304cc2cd1804ee1f8f3504ec77813d29c8e upstream.

xol_add_vma() maps the uninitialized page allocated by __create_xol_area()
into userspace. On some architectures (x86) this memory is readable even
without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ,
although this doesn't really matter, debugger can read this memory anyway.

Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/

Reported-by: Will Deacon &lt;will@kernel.org&gt;
Fixes: d4b3b6384f98 ("uprobes/core: Allocate XOL slots for uprobes use")
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Fix small negative period being ignored</title>
<updated>2024-11-08T15:19:13Z</updated>
<author>
<name>Luo Gengkun</name>
<email>luogengkun@huaweicloud.com</email>
</author>
<published>2024-08-31T07:43:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7fddba7b1bb6f1cc35269e510bc832feb3c54b11'/>
<id>urn:sha1:7fddba7b1bb6f1cc35269e510bc832feb3c54b11</id>
<content type='text'>
commit 62c0b1061593d7012292f781f11145b2d46f43ab upstream.

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul&gt;
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul&gt;
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul&gt;
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul&gt;
  ...

test.c

int main() {
        for (int i = 0; i &lt; 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Reviewed-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240831074316.2106159-2-luogengkun@huaweicloud.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>signal: Replace BUG_ON()s</title>
<updated>2024-11-08T15:19:12Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-06-10T16:42:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0f9c27fbb8a52c50ff7d2659386f1f43e7fbddee'/>
<id>urn:sha1:0f9c27fbb8a52c50ff7d2659386f1f43e7fbddee</id>
<content type='text'>
[ Upstream commit 7f8af7bac5380f2d95a63a6f19964e22437166e1 ]

These really can be handled gracefully without killing the machine.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
