<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sys.c, branch v4.9.280</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.280</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.280'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-10-01T18:40:04Z</updated>
<entry>
<title>kernel/sys.c: avoid copying possible padding bytes in copy_to_user</title>
<updated>2020-10-01T18:40:04Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2019-12-05T00:50:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bab62c69778f4eaf79ca0beb62d3290c4e494464'/>
<id>urn:sha1:bab62c69778f4eaf79ca0beb62d3290c4e494464</id>
<content type='text'>
[ Upstream commit 5e1aada08cd19ea652b2d32a250501d09b02ff2e ]

Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.

Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Cc: Dan Carpenter &lt;error27@gmail.com&gt;
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/sys.c: prctl: fix false positive in validate_prctl_map()</title>
<updated>2019-06-22T06:17:13Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@gmail.com</email>
</author>
<published>2019-05-14T00:15:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e74cb9e009cf37a39b767f341365eb9548d8ba86'/>
<id>urn:sha1:e74cb9e009cf37a39b767f341365eb9548d8ba86</id>
<content type='text'>
[ Upstream commit a9e73998f9d705c94a8dca9687633adc0f24a19a ]

While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long).  These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.

As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL.  From the image dump of a program:

 | "mm_start_code": "0x400000",
 | "mm_end_code": "0x8f5fb4",
 | "mm_start_data": "0xf1bfb0",
 | "mm_end_data": "0xf1bfb0",

Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.

Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a9e3 ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Cc: Andrey Vagin &lt;avagin@gmail.com&gt;
Cc: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Cc: Pavel Emelyanov &lt;xemul@virtuozzo.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sys: don't hold uts_sem while accessing userspace memory</title>
<updated>2018-09-09T18:01:24Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2018-06-25T16:34:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55463c60b7d56e98936abfd092b4983dd010df50'/>
<id>urn:sha1:55463c60b7d56e98936abfd092b4983dd010df50</id>
<content type='text'>
commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kernel/sys.c: fix potential Spectre v1 issue</title>
<updated>2018-05-30T05:50:18Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2018-05-25T21:47:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=960828aaa08f7a038621f3587cfbe33ccb244b07'/>
<id>urn:sha1:960828aaa08f7a038621f3587cfbe33ccb244b07</id>
<content type='text'>
commit 23d6aef74da86a33fa6bb75f79565e0a16ee97c2 upstream.

`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

  kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()-&gt;signal-&gt;rlim' (local cap)
  kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()-&gt;signal-&gt;rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current-&gt;signal-&gt;rlim

Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&amp;m=152449131114778&amp;w=2

Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>nospec: Allow getting/setting on non-current task</title>
<updated>2018-05-22T14:58:01Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2018-05-01T22:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4272f528da381673a8e7845c93daa88b8aa4f4e9'/>
<id>urn:sha1:4272f528da381673a8e7845c93daa88b8aa4f4e9</id>
<content type='text'>
commit 7bbf1373e228840bb0295a2ca26d548ef37f448e upstream

Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.

This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: Add speculation control prctls</title>
<updated>2018-05-22T14:58:00Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-04-29T13:20:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4812ffbbfcac35270b82292e84e8e7187088c8b8'/>
<id>urn:sha1:4812ffbbfcac35270b82292e84e8e7187088c8b8</id>
<content type='text'>
commit b617cfc858161140d69cc0b5cc211996b557a1c7 upstream

Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.

PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:

Bit  Define           Description
0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
                      PR_SET_SPECULATION_CTRL
1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
                      disabled
2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
                      enabled

If all bits are 0 the CPU is not affected by the speculation misfeature.

If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.

PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.

The common return values are:

EINVAL  prctl is not implemented by the architecture or the unused prctl()
        arguments are not 0
ENODEV  arg2 is selecting a not supported speculation misfeature

PR_SET_SPECULATION_CTRL has these additional return values:

ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO   prctl control of the selected speculation misfeature is disabled

The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.

Based on an initial patch from Tim Chen and mostly rewritten.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable</title>
<updated>2016-05-24T00:04:14Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-23T23:26:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17b0573d77fbd547cbf1d711b90d269bdcc1efd3'/>
<id>urn:sha1:17b0573d77fbd547cbf1d711b90d269bdcc1efd3</id>
<content type='text'>
PR_SET_THP_DISABLE requires mmap_sem for write.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>timer: convert timer_slack_ns from unsigned long to u64</title>
<updated>2016-03-17T22:09:34Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2016-03-17T21:20:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=da8b44d5a9f8bf26da637b7336508ca534d6b319'/>
<id>urn:sha1:da8b44d5a9f8bf26da637b7336508ca534d6b319</id>
<content type='text'>
This patchset introduces a /proc/&lt;pid&gt;/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).

The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems.  It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.

The second patch introduces the /proc/&lt;pid&gt;/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.

With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:

$ time sleep 1

real    0m10.747s
user    0m0.001s
sys     0m0.005s

The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s.  Let me
know if it makes sense to break that up more or not.

Other than that things are fairly straightforward.

This patch (of 2):

The timer_slack_ns value in the task struct is currently a unsigned
long.  This means that on 32bit applications, the maximum slack is just
over 4 seconds.  However, on 64bit machines, its much much larger (~500
years).

This disparity could make application development a little (as well as
the default_slack) to a u64.  This means both 32bit and 64bit systems
have the same effective internal slack range.

Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.

This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Oren Laadan &lt;orenl@cellrox.com&gt;
Cc: Ruchi Kandoi &lt;kandoiruchi@google.com&gt;
Cc: Rom Lemarchand &lt;romlem@android.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Android Kernel Team &lt;kernel-team@android.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: take mmap sem for writing to protect against others</title>
<updated>2016-01-21T01:09:18Z</updated>
<author>
<name>Mateusz Guzik</name>
<email>mguzik@redhat.com</email>
</author>
<published>2016-01-20T23:01:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217'/>
<id>urn:sha1:ddf1d398e517e660207e2c807f76a90df543a217</id>
<content type='text'>
An unprivileged user can trigger an oops on a kernel with
CONFIG_CHECKPOINT_RESTORE.

proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env
start/end values. These get sanity checked as follows:
        BUG_ON(arg_start &gt; arg_end);
        BUG_ON(env_start &gt; env_end);

These can be changed by prctl_set_mm. Turns out also takes the semaphore for
reading, effectively rendering it useless. This results in:

  kernel BUG at fs/proc/base.c:240!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: virtio_net
  CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000
  RIP: proc_pid_cmdline_read+0x520/0x530
  RSP: 0018:ffff8800784d3db8  EFLAGS: 00010206
  RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246
  RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050
  R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600
  FS:  00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0
  Call Trace:
    __vfs_read+0x37/0x100
    vfs_read+0x82/0x130
    SyS_read+0x58/0xd0
    entry_SYSCALL_64_fastpath+0x12/0x76
  Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b &lt;0f&gt; 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
  RIP   proc_pid_cmdline_read+0x520/0x530
  ---[ end trace 97882617ae9c6818 ]---

Turns out there are instances where the code just reads aformentioned
values without locking whatsoever - namely environ_read and get_cmdline.

Interestingly these functions look quite resilient against bogus values,
but I don't believe this should be relied upon.

The first patch gets rid of the oops bug by grabbing mmap_sem for
writing.

The second patch is optional and puts locking around aformentioned
consumers for safety.  Consumers of other fields don't seem to benefit
from similar treatment and are left untouched.

This patch (of 2):

The code was taking the semaphore for reading, which does not protect
against readers nor concurrent modifications.

The problem could cause a sanity checks to fail in procfs's cmdline
reader, resulting in an OOPS.

Note that some functions perform an unlocked read of various mm fields,
but they seem to be fine despite possible modificaton.

Signed-off-by: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Jarod Wilson &lt;jarod@redhat.com&gt;
Cc: Jan Stancek &lt;jstancek@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Anshuman Khandual &lt;anshuman.linux@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode</title>
<updated>2015-11-07T01:50:42Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2015-11-07T00:32:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8639b46139b0e4ea3b1ab1c274e410ee327f1d89'/>
<id>urn:sha1:8639b46139b0e4ea3b1ab1c274e410ee327f1d89</id>
<content type='text'>
setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
the current pid namespace.  This is in contrast to both the other modes of
setpriority and the example of kill(-1).  Fix this.  getpriority and
ioprio have the same failure mode, fix them too.

Eric said:

: After some more thinking about it this patch sounds justifiable.
:
: My goal with namespaces is not to build perfect isolation mechanisms
: as that can get into ill defined territory, but to build well defined
: mechanisms.  And to handle the corner cases so you can use only
: a single namespace with well defined results.
:
: In this case you have found the two interfaces I am aware of that
: identify processes by uid instead of by pid.  Which quite frankly is
: weird.  Unfortunately the weird unexpected cases are hard to handle
: in the usual way.
:
: I was hoping for a little more information.  Changes like this one we
: have to be careful of because someone might be depending on the current
: behavior.  I don't think they are and I do think this make sense as part
: of the pid namespace.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Ambrose Feinstein &lt;ambrose@google.com&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
