<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/exit.c, branch v4.4.265</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.265</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.265'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-02-03T22:16:16Z</updated>
<entry>
<title>futex: Mark the begin of futex exit explicitly</title>
<updated>2021-02-03T22:16:16Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-02-01T15:12:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=40da511f787b75a80efa3ae5d3f90753a795a9e4'/>
<id>urn:sha1:40da511f787b75a80efa3ae5d3f90753a795a9e4</id>
<content type='text'>
commit 18f694385c4fd77a09851fd301236746ca83f3cb upstream.

Instead of relying on PF_EXITING use an explicit state for the futex exit
and set it in the futex exit function. This moves the smp barrier and the
lock/unlock serialization into the futex code.

As with the DEAD state this is restricted to the exit path as exec
continues to use the same task struct.

This allows to simplify that logic in a next step.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>futex: Set task::futex_state to DEAD right after handling futex exit</title>
<updated>2021-02-03T22:16:16Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-02-01T15:12:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=da81e2f3e1cb8bf5bb1323370c5e331606841260'/>
<id>urn:sha1:da81e2f3e1cb8bf5bb1323370c5e331606841260</id>
<content type='text'>
commit f24f22435dcc11389acc87e5586239c1819d217c upstream.

Setting task::futex_state in do_exit() is rather arbitrarily placed for no
reason. Move it into the futex code.

Note, this is only done for the exit cleanup as the exec cleanup cannot set
the state to FUTEX_STATE_DEAD because the task struct is still in active
use.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.439511191@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>exit/exec: Seperate mm_release()</title>
<updated>2021-02-03T22:16:16Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-02-01T15:12:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=93d2eb4077a953965ccc2f452cc3bd8483e32a21'/>
<id>urn:sha1:93d2eb4077a953965ccc2f452cc3bd8483e32a21</id>
<content type='text'>
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream.

mm_release() contains the futex exit handling. mm_release() is called from
do_exit()-&gt;exit_mm() and from exec()-&gt;exec_mm().

In the exit_mm() case PF_EXITING and the futex state is updated. In the
exec_mm() case these states are not touched.

As the futex exit code needs further protections against exit races, this
needs to be split into two functions.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>futex: Replace PF_EXITPIDONE with a state</title>
<updated>2021-02-03T22:16:16Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-02-01T15:12:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0838b8271ad99c8b1cfea47fc9054ccbce6c0e88'/>
<id>urn:sha1:0838b8271ad99c8b1cfea47fc9054ccbce6c0e88</id>
<content type='text'>
commit 3d4775df0a89240f671861c6ab6e8d59af8e9e41 upstream.

The futex exit handling relies on PF_ flags. That's suboptimal as it
requires a smp_mb() and an ugly lock/unlock of the exiting tasks pi_lock in
the middle of do_exit() to enforce the observability of PF_EXITING in the
futex code.

Add a futex_state member to task_struct and convert the PF_EXITPIDONE logic
over to the new state. The PF_EXITING dependency will be cleaned up in a
later step.

This prepares for handling various futex exit issues later.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.149449274@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>don't dump the threads that had been already exiting when zapped.</title>
<updated>2020-11-18T17:25:01Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2020-10-28T20:39:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55c181227c190142656535364654c96ba50c50cc'/>
<id>urn:sha1:55c181227c190142656535364654c96ba50c50cc</id>
<content type='text'>
commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream.

Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.

Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.

That covers all callers of do_coredump().  Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time.  Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing).  In those cases we reach exit_mm()
when mm-&gt;core_state is already set, but the stack contents is not what
we would have in signal delivery.

At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers.  In case of alpha we
leak 312 bytes of stack.  Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.

Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their -&gt;mm.  So let's pretend that zapper always
loses the race.  IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]

As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads.  Which excludes voluntary
exit()/oopsen/traps/etc.  The dumper thread itself is unaffected by that,
so seccomp is fine.

[*] originally I intended to add a new flag in tsk-&gt;flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.

Cc: stable@vger.kernel.org
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kernel/exit.c: release ptraced tasks before zap_pid_ns_processes</title>
<updated>2019-02-06T18:43:07Z</updated>
<author>
<name>Andrei Vagin</name>
<email>avagin@gmail.com</email>
</author>
<published>2019-02-01T22:20:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e790eeabc47cd4905606839fe3eb18df2bbd4d3e'/>
<id>urn:sha1:e790eeabc47cd4905606839fe3eb18df2bbd4d3e</id>
<content type='text'>
commit 8fb335e078378c8426fabeed1ebee1fbf915690c upstream.

Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.

zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list.  In this case, we will have one unkillable process with one or
more dead children.

Thanks to Oleg for the advice to release tasks in find_child_reaper().

Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin &lt;avagin@gmail.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kernel/exit.c: avoid undefined behaviour when calling wait4()</title>
<updated>2018-05-26T06:48:51Z</updated>
<author>
<name>zhongjiang</name>
<email>zhongjiang@huawei.com</email>
</author>
<published>2017-07-10T22:53:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ea00b22b02f228cb58ee6c6707c86ec270e37fba'/>
<id>urn:sha1:ea00b22b02f228cb58ee6c6707c86ec270e37fba</id>
<content type='text'>
commit dd83c161fbcc5d8be637ab159c0de015cbff5ba4 upstream.

wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9

The related calltrace is as follows:

  negation of -2147483648 cannot be represented in type 'int':
  CPU: 9 PID: 16482 Comm: zj Tainted: G    B          ---- -------   3.10.0-327.53.58.71.x86_64+ #66
  Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SyS_wait4+0x1cb/0x1e0
    system_call_fastpath+0x16/0x1b

Exclude the overflow to avoid the UBSAN warning.

Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang &lt;zhongjiang@huawei.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>wait/ptrace: assume __WALL if the child is traced</title>
<updated>2016-06-08T01:14:35Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2016-05-23T23:23:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0eea2e24fc600fa20d357e022ef29029140bb714'/>
<id>urn:sha1:0eea2e24fc600fa20d357e022ef29029140bb714</id>
<content type='text'>
commit bf959931ddb88c4e4366e96dd22e68fa0db9527c upstream.

The following program (simplified version of generated by syzkaller)

	#include &lt;pthread.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/ptrace.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;signal.h&gt;

	void *thread_func(void *arg)
	{
		ptrace(PTRACE_TRACEME, 0,0,0);
		return 0;
	}

	int main(void)
	{
		pthread_t thread;

		if (fork())
			return 0;

		while (getppid() != 1)
			;

		pthread_create(&amp;thread, NULL, thread_func, NULL);
		pthread_join(thread, NULL);
		return 0;
	}

creates an unreapable zombie if /sbin/init doesn't use __WALL.

This is not a kernel bug, at least in a sense that everything works as
expected: debugger should reap a traced sub-thread before it can reap the
leader, but without __WALL/__WCLONE do_wait() ignores sub-threads.

Unfortunately, it seems that /sbin/init in most (all?) distributions
doesn't use it and we have to change the kernel to avoid the problem.
Note also that most init's use sys_waitid() which doesn't allow __WALL, so
the necessary user-space fix is not that trivial.

This patch just adds the "ptrace" check into eligible_child().  To some
degree this matches the "tsk-&gt;ptrace" in exit_notify(), -&gt;exit_signal is
mostly ignored when the tracee reports to debugger.  Or WSTOPPED, the
tracer doesn't need to set this flag to wait for the stopped tracee.

This obviously means the user-visible change: __WCLONE and __WALL no
longer have any meaning for debugger.  And I can only hope that this won't
break something, but at least strace/gdb won't suffer.

We could make a more conservative change.  Say, we can take __WCLONE into
account, or !thread_group_leader().  But it would be nice to not
complicate these historical/confusing checks.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: "Michael Kerrisk (man-pages)" &lt;mtk.manpages@gmail.com&gt;
Cc: Pedro Alves &lt;palves@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2015-11-04T02:03:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-04T02:03:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=53528695ff6d8b77011bc818407c13e30914a946'/>
<id>urn:sha1:53528695ff6d8b77011bc818407c13e30914a946</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle were:

   - sched/fair load tracking fixes and cleanups (Byungchul Park)

   - Make load tracking frequency scale invariant (Dietmar Eggemann)

   - sched/deadline updates (Juri Lelli)

   - stop machine fixes, cleanups and enhancements for bugs triggered by
     CPU hotplug stress testing (Oleg Nesterov)

   - scheduler preemption code rework: remove PREEMPT_ACTIVE and related
     cleanups (Peter Zijlstra)

   - Rework the sched_info::run_delay code to fix races (Peter Zijlstra)

   - Optimize per entity utilization tracking (Peter Zijlstra)

   - ... misc other fixes, cleanups and smaller updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  sched: Don't scan all-offline -&gt;cpus_allowed twice if !CONFIG_CPUSETS
  sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
  sched: Start stopper early
  stop_machine: Kill cpu_stop_threads-&gt;setup() and cpu_stop_unpark()
  stop_machine: Kill smp_hotplug_thread-&gt;pre_unpark, introduce stop_machine_unpark()
  stop_machine: Change cpu_stop_queue_two_works() to rely on stopper-&gt;enabled
  stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
  stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
  sched/x86: Fix typo in __switch_to() comments
  sched/core: Remove a parameter in the migrate_task_rq() function
  sched/core: Drop unlikely behind BUG_ON()
  sched/core: Fix task and run queue sched_info::run_delay inconsistencies
  sched/numa: Fix task_tick_fair() from disabling numa_balancing
  sched/core: Add preempt_count invariant check
  sched/core: More notrace annotations
  sched/core: Kill PREEMPT_ACTIVE
  sched/core, sched/x86: Kill thread_info::saved_preempt_count
  sched/core: Simplify preempt_count tests
  sched/core: Robustify preemption leak checks
  sched/core: Stop setting PREEMPT_ACTIVE
  ...
</content>
</entry>
<entry>
<title>rcu: Move preemption disabling out of __srcu_read_lock()</title>
<updated>2015-10-06T18:15:43Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2015-09-01T07:42:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49f5903b473c5f63f3b57856d1bd4593db0a2eef'/>
<id>urn:sha1:49f5903b473c5f63f3b57856d1bd4593db0a2eef</id>
<content type='text'>
Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes.  This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock().  It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
</content>
</entry>
</feed>
