<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/sched.h, branch v3.16.68</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.68</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.68'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-05-22T22:15:18Z</updated>
<entry>
<title>x86/speculation: Add prctl() control for indirect branch speculation</title>
<updated>2019-05-22T22:15:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-11-25T18:33:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fe123a2566a84453c387d8cb846622277380ef5e'/>
<id>urn:sha1:fe123a2566a84453c387d8cb846622277380ef5e</id>
<content type='text'>
commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream.

Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
indirect branch speculation via STIBP and IBPB.

Invocations:
 Check indirect branch speculation status with
 - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);

 Enable indirect branch speculation with
 - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);

 Disable indirect branch speculation with
 - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);

 Force disable indirect branch speculation with
 - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);

See Documentation/userspace-api/spec_ctrl.rst.

Signed-off-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Casey Schaufler &lt;casey.schaufler@intel.com&gt;
Cc: Asit Mallick &lt;asit.k.mallick@intel.com&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Jon Masters &lt;jcm@redhat.com&gt;
Cc: Waiman Long &lt;longman9394@gmail.com&gt;
Cc: Greg KH &lt;gregkh@linuxfoundation.org&gt;
Cc: Dave Stewart &lt;david.c.stewart@intel.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
[bwh: Backported to 3.16:
 - Drop changes in tools/include/uapi/linux/prctl.h
 - Adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping</title>
<updated>2019-05-02T20:42:04Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2019-04-19T00:50:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a301e6a651037c11d2d9932a35fb56a04eedba8c'/>
<id>urn:sha1:a301e6a651037c11d2d9932a35fb56a04eedba8c</id>
<content type='text'>
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the -&gt;core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once -&gt;core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - Drop changes in Infiniband and userfaultfd
 - In clear_refs_write(), use up_read() as we never upgrade to a write lock
 - Adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>prctl: Add force disable speculation</title>
<updated>2018-10-03T03:09:42Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-05-03T20:09:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b51874e298c685fd20d68f075c70b63b54633f2'/>
<id>urn:sha1:0b51874e298c685fd20d68f075c70b63b54633f2</id>
<content type='text'>
commit 356e4bfff2c5489e016fdb925adbf12a1e3950ee upstream.

For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: get rid of vmacache_flush_all() entirely</title>
<updated>2018-09-25T22:47:35Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-09-13T09:57:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=536c4d174c0402c5fbf6f7a995f7c9539d124410'/>
<id>urn:sha1:536c4d174c0402c5fbf6f7a995f7c9539d124410</id>
<content type='text'>
commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream.

Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too.  It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit.  That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.

So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too.  Win-win.

[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
  also just goes away entirely with this ]

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[bwh: Backported to 3.16: drop changes to mm debug code]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>x86: pvclock: Really remove the sched notifier for cross-cpu migrations</title>
<updated>2018-02-13T18:42:34Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2015-04-23T11:20:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4070ed709318608f7beb23d93956fc32d7bf841e'/>
<id>urn:sha1:4070ed709318608f7beb23d93956fc32d7bf841e</id>
<content type='text'>
commit 73459e2a1ada09a68c02cc5b73f3116fc8194b3d upstream.

This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27
and 80f7fdb1c7f0f9266421f823964fd1962681f6ce.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Suggested-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ptrace: Capture the ptracer's creds not PT_PTRACE_CAP</title>
<updated>2018-01-01T20:52:10Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-11-15T00:48:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d55a94ed03a24794d47f80d5300825f6c095a0a7'/>
<id>urn:sha1:d55a94ed03a24794d47f80d5300825f6c095a0a7</id>
<content type='text'>
commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked.  This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in.  This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid.  More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Fixes: 4214e42f96d4 ("v2.4.9.11 -&gt; v2.4.9.12")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags</title>
<updated>2017-10-12T14:28:22Z</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-25T01:41:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1cadb6597bea8e6644328a98282643570cb8a13'/>
<id>urn:sha1:d1cadb6597bea8e6644328a98282643570cb8a13</id>
<content type='text'>
commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk-&gt;flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>sched: add macros to define bitops for task atomic flags</title>
<updated>2017-10-12T14:28:22Z</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-25T01:40:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed7b1fae294e7517e60449a237cca4b6730aa4af'/>
<id>urn:sha1:ed7b1fae294e7517e60449a237cca4b6730aa4af</id>
<content type='text'>
commit e0e5070b20e01f0321f97db4e4e174f3f6b49e50 upstream.

This will simplify code when we add new flags.

v3:
- Kees pointed out that no_new_privs should never be cleared, so we
shouldn't define task_clear_no_new_privs(). we define 3 macros instead
of a single one.

v2:
- updated scripts/tags.sh, suggested by Peter

Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>sched: fix confusing PFA_NO_NEW_PRIVS constant</title>
<updated>2017-10-12T14:28:22Z</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-25T01:40:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d0999b1d09e6dc09bfcce638bbac6b582b85ad5d'/>
<id>urn:sha1:d0999b1d09e6dc09bfcce638bbac6b582b85ad5d</id>
<content type='text'>
commit a2b86f772227bcaf962c8b134f8d187046ac5f0e upstream.

Commit 1d4457f99928 ("sched: move no_new_privs into new atomic flags")
defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing
because it is used as bit number. Redefine it as decimal bit number.

Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
[ lizf: slightly modified subject and changelog ]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>sched: move no_new_privs into new atomic flags</title>
<updated>2017-10-12T14:28:22Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2014-05-21T22:23:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f3333219189a3fe4bbbd2d67e636ed1bb74d243'/>
<id>urn:sha1:5f3333219189a3fe4bbbd2d67e636ed1bb74d243</id>
<content type='text'>
commit 1d4457f99928a968767f6405b4a1f50845aa15fd upstream.

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
