<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/exit.c, branch v5.4.292</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.292</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.292'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-07-18T09:40:51Z</updated>
<entry>
<title>mm: optimize the redundant loop of mm_update_owner_next()</title>
<updated>2024-07-18T09:40:51Z</updated>
<author>
<name>Jinliang Zheng</name>
<email>alexjlzheng@tencent.com</email>
</author>
<published>2024-06-20T12:21:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=25ab2411cb91b00bfe26cd26585ffaf7bb64a028'/>
<id>urn:sha1:25ab2411cb91b00bfe26cd26585ffaf7bb64a028</id>
<content type='text'>
commit cf3f9a593dab87a032d2b6a6fb205e7f3de4f0a1 upstream.

When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.

If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.

Recognize this situation in advance and exit early.

Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Tycho Andersen &lt;tandersen@netflix.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Remove uninitialized_var() usage</title>
<updated>2023-06-09T08:29:01Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2020-06-03T20:09:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0638dcc7e75fbb766761e7b4694d0f0f141bbbd1'/>
<id>urn:sha1:0638dcc7e75fbb766761e7b4694d0f0f141bbbd1</id>
<content type='text'>
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky &lt;leonro@mellanox.com&gt; # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt; # IB
Acked-by: Kalle Valo &lt;kvalo@codeaurora.org&gt; # wireless drivers
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt; # erofs
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>exit: Use READ_ONCE() for all oops/warn limit reads</title>
<updated>2023-02-06T06:52:50Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2023-02-02T04:42:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a7cc1b5d76a21f4d2687c7eb4bf71ffa3921f6ac'/>
<id>urn:sha1:a7cc1b5d76a21f4d2687c7eb4bf71ffa3921f6ac</id>
<content type='text'>
commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream.

Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net
Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Marco Elver &lt;elver@google.com&gt;
Cc: tangmeng &lt;tangmeng@uniontech.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exit: Allow oops_limit to be disabled</title>
<updated>2023-02-06T06:52:50Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2023-02-02T04:42:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7020a9234e867764c09c22f54d8eddf66060dd66'/>
<id>urn:sha1:7020a9234e867764c09c22f54d8eddf66060dd66</id>
<content type='text'>
commit de92f65719cd672f4b48397540b9f9eff67eca40 upstream.

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: "Jason A. Donenfeld" &lt;Jason@zx2c4.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: linux-doc@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exit: Expose "oops_count" to sysfs</title>
<updated>2023-02-06T06:52:50Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2023-02-02T04:42:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5a3482f2c1ccbee3537b99a63e4837cfd10257ce'/>
<id>urn:sha1:5a3482f2c1ccbee3537b99a63e4837cfd10257ce</id>
<content type='text'>
commit 9db89b41117024f80b38b15954017fb293133364 upstream.

Since Oops count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/oops_count to expose it to userspace.

Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exit: Put an upper limit on how often we can oops</title>
<updated>2023-02-06T06:52:49Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2023-02-02T04:42:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28facdf7b0971ecba43662e6395c761fd2931d43'/>
<id>urn:sha1:28facdf7b0971ecba43662e6395c761fd2931d43</id>
<content type='text'>
commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream.

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &amp;die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.

Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exit: Add and use make_task_dead.</title>
<updated>2023-02-06T06:52:49Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2023-02-02T04:42:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9a18c9c8336fff473a2b72707bd14a52298417b8'/>
<id>urn:sha1:9a18c9c8336fff473a2b72707bd14a52298417b8</id>
<content type='text'>
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream.

There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>don't dump the threads that had been already exiting when zapped.</title>
<updated>2020-11-18T18:20:31Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2020-10-28T20:39:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=974e3a7002a0c4adb921b8b92656463ae795b61c'/>
<id>urn:sha1:974e3a7002a0c4adb921b8b92656463ae795b61c</id>
<content type='text'>
commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream.

Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.

Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.

That covers all callers of do_coredump().  Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time.  Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing).  In those cases we reach exit_mm()
when mm-&gt;core_state is already set, but the stack contents is not what
we would have in signal delivery.

At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers.  In case of alpha we
leak 312 bytes of stack.  Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.

Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their -&gt;mm.  So let's pretend that zapper always
loses the race.  IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]

As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads.  Which excludes voluntary
exit()/oopsen/traps/etc.  The dumper thread itself is unaffected by that,
so seccomp is fine.

[*] originally I intended to add a new flag in tsk-&gt;flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.

Cc: stable@vger.kernel.org
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>exit: Move preemption fixup up, move blocking operations down</title>
<updated>2020-06-22T07:31:01Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2020-03-05T22:06:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e256dd5b05561b051e895121530d2a0062c7a0f'/>
<id>urn:sha1:2e256dd5b05561b051e895121530d2a0062c7a0f</id>
<content type='text'>
[ Upstream commit 586b58cac8b4683eb58a1446fbc399de18974e40 ]

With CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_CGROUPS=y, kernel oopses in
non-preemptible context look untidy; after the main oops, the kernel prints
a "sleeping function called from invalid context" report because
exit_signals() -&gt; cgroup_threadgroup_change_begin() -&gt; percpu_down_read()
can sleep, and that happens before the preempt_count_set(PREEMPT_ENABLED)
fixup.

It looks like the same thing applies to profile_task_exit() and
kcov_task_exit().

Fix it by moving the preemption fixup up and the calls to
profile_task_exit() and kcov_task_exit() down.

Fixes: 1dc0fffc48af ("sched/core: Robustify preemption leak checks")
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200305220657.46800-1-jannh@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>exit: panic before exit_mm() on global init exit</title>
<updated>2020-01-09T09:20:01Z</updated>
<author>
<name>chenqiwu</name>
<email>chenqiwu@xiaomi.com</email>
</author>
<published>2019-12-19T06:29:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=66d9052e6c765f74fa4a1bce874fab29a6b9583d'/>
<id>urn:sha1:66d9052e6c765f74fa4a1bce874fab29a6b9583d</id>
<content type='text'>
commit 43cf75d96409a20ef06b756877a2e72b10a026fc upstream.

Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-&gt; exit_notify()
   -&gt; forget_original_parent()
      -&gt; find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu &lt;chenqiwu@xiaomi.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
