<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/exec.c, branch v4.9.15</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.15</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.15'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-01-06T09:40:13Z</updated>
<entry>
<title>ptrace: Capture the ptracer's creds not PT_PTRACE_CAP</title>
<updated>2017-01-06T09:40:13Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-11-15T00:48:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e747b4ae3b6bca205d82e86366e140cdcbfb7731'/>
<id>urn:sha1:e747b4ae3b6bca205d82e86366e140cdcbfb7731</id>
<content type='text'>
commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked.  This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in.  This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid.  More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Fixes: 4214e42f96d4 ("v2.4.9.11 -&gt; v2.4.9.12")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>fs: exec: apply CLOEXEC before changing dumpable task flags</title>
<updated>2017-01-06T09:40:12Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>asarai@suse.de</email>
</author>
<published>2016-12-21T05:26:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1df5a63716b0f73e95b1b26d878691e6175ea34'/>
<id>urn:sha1:c1df5a63716b0f73e95b1b26d878691e6175ea34</id>
<content type='text'>
commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream.

If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/&lt;pid&gt;/fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-&gt; proc_pid_link_inode_operations.get_link
   -&gt; proc_pid_get_link
      -&gt; proc_fd_access_allowed
         -&gt; ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Reported-by: Michael Crosby &lt;crosbymichael@gmail.com&gt;
Signed-off-by: Aleksa Sarai &lt;asarai@suse.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>exec: Ensure mm-&gt;user_ns contains the execed files</title>
<updated>2017-01-06T09:40:12Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-11-17T04:06:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=21245b8635e8636e9e01df06f33cf08f369322b7'/>
<id>urn:sha1:21245b8635e8636e9e01df06f33cf08f369322b7</id>
<content type='text'>
commit f84df2a6f268de584a201e8911384a2d244876e3 upstream.

When the user namespace support was merged the need to prevent
ptrace from revealing the contents of an unreadable executable
was overlooked.

Correct this oversight by ensuring that the executed file
or files are in mm-&gt;user_ns, by adjusting mm-&gt;user_ns.

Use the new function privileged_wrt_inode_uidgid to see if
the executable is a member of the user namespace, and as such
if having CAP_SYS_PTRACE in the user namespace should allow
tracing the executable.  If not update mm-&gt;user_ns to
the parent user namespace until an appropriate parent is found.

Reported-by: Jann Horn &lt;jann@thejh.net&gt;
Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: replace get_user_pages_remote() write/force parameters with gup_flags</title>
<updated>2016-10-19T15:12:02Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lstoakes@gmail.com</email>
</author>
<published>2016-10-13T00:20:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9beae1ea89305a9667ceaab6d0bf46a045ad71e7'/>
<id>urn:sha1:9beae1ea89305a9667ceaab6d0bf46a045ad71e7</id>
<content type='text'>
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu</title>
<updated>2016-08-04T22:04:44Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-08-04T22:04:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8e7106a60748e74f4c76b2204e83f14e4dc041cc'/>
<id>urn:sha1:8e7106a60748e74f4c76b2204e83f14e4dc041cc</id>
<content type='text'>
Pull m68knommu updates from Greg Ungerer:
 "This series is all about Nicolas flat format support for MMU systems.

  Traditional m68k no-MMU flat format binaries can now be run on m68k
  MMU enabled systems too.  The series includes some nice cleanups of
  the binfmt_flat code and converts it to using proper user space
  accessor functions.

  With all this in place you can boot and run a complete no-MMU flat
  format based user space on an MMU enabled system"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: enable binfmt_flat on systems with an MMU
  binfmt_flat: allow compressed flat binary format to work on MMU systems
  binfmt_flat: add MMU-specific support
  binfmt_flat: update libraries' data segment pointer with userspace accessors
  binfmt_flat: use clear_user() rather than memset() to clear .bss
  binfmt_flat: use proper user space accessors with old relocs code
  binfmt_flat: use proper user space accessors with relocs processing code
  binfmt_flat: clean up create_flat_tables() and stack accesses
  binfmt_flat: use generic transfer_args_to_stack()
  elf_fdpic_transfer_args_to_stack(): make it generic
  binfmt_flat: prevent kernel dammage from corrupted executable headers
  binfmt_flat: convert printk invocations to their modern form
  binfmt_flat: assorted cleanups
  m68k: use same start_thread() on MMU and no-MMU
  m68k: fix file path comment
  m68k: fix bFLT executable running on MMU enabled systems
</content>
</entry>
<entry>
<title>firmware: support loading into a pre-allocated buffer</title>
<updated>2016-08-02T23:35:10Z</updated>
<author>
<name>Stephen Boyd</name>
<email>stephen.boyd@linaro.org</email>
</author>
<published>2016-08-02T21:04:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a098ecd2fa7db8fa4fcc178a43627b29b226edb9'/>
<id>urn:sha1:a098ecd2fa7db8fa4fcc178a43627b29b226edb9</id>
<content type='text'>
Some systems are memory constrained but they need to load very large
firmwares.  The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver.  This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.

This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else.  Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer.  This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem.  It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.

For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place.  I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch.  Plus
the vmalloc pressure is reduced.

This is based on a patch from Vikram Mulukutla on codeaurora.org:
  https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&amp;id=0a328c5f6cd999f5c591f172216835636f39bcb5

Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd &lt;stephen.boyd@linaro.org&gt;
Cc: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Cc: Vikram Mulukutla &lt;markivx@codeaurora.org&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Ming Lei &lt;ming.lei@canonical.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>elf_fdpic_transfer_args_to_stack(): make it generic</title>
<updated>2016-07-25T06:51:49Z</updated>
<author>
<name>Nicolas Pitre</name>
<email>nicolas.pitre@linaro.org</email>
</author>
<published>2016-07-24T15:30:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7e7ec6a934349ef6983f06f7ac0da09cc8a42983'/>
<id>urn:sha1:7e7ec6a934349ef6983f06f7ac0da09cc8a42983</id>
<content type='text'>
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Reviewed-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Signed-off-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
</content>
</entry>
<entry>
<title>fs: Treat foreign mounts as nosuid</title>
<updated>2016-06-24T15:40:41Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2016-06-23T21:41:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=380cf5ba6b0a0b307f4afb62b186ca801defb203'/>
<id>urn:sha1:380cf5ba6b0a0b307f4afb62b186ca801defb203</id>
<content type='text'>
If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Seth Forshee &lt;seth.forshee@canonical.com&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>exec: make exec path waiting for mmap_sem killable</title>
<updated>2016-05-24T00:04:14Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-23T23:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f268dfe905d4682150d4acbb25f59adc04cd9398'/>
<id>urn:sha1:f268dfe905d4682150d4acbb25f59adc04cd9398</id>
<content type='text'>
setup_arg_pages requires mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.  All the callers are already handling error
path and the fatal signal doesn't need any additional treatment.

The same applies to __bprm_mm_init.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: remove the no longer needed remove_arg_zero()-&gt;free_arg_page()</title>
<updated>2016-05-24T00:04:14Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2016-05-23T23:24:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9eb8a659dea694b0dcbd6287f6b1fbdc523b80bc'/>
<id>urn:sha1:9eb8a659dea694b0dcbd6287f6b1fbdc523b80bc</id>
<content type='text'>
remove_arg_zero() does free_arg_page() for no reason.  This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff4816c ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm-&gt;page[] array.  Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea39318 ("mm: variable
length argument support").

CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()-&gt;free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm-&gt;page[] was allocated or not, so the
"extra" non-freed page is fine.  OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.

And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm-&gt;p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.

NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true.  copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between.  Afaics nothing really bad can happen because
we must always have the null-terminated bprm-&gt;filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm-&gt;p &lt; bprm-&gt;exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.

Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported by: hujunjie &lt;jj.net@163.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
