<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/proc, branch v5.7.10</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.7.10</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.7.10'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-06-24T15:49:24Z</updated>
<entry>
<title>proc/bootconfig: Fix to use correct quotes for value</title>
<updated>2020-06-24T15:49:24Z</updated>
<author>
<name>Masami Hiramatsu</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2020-06-16T10:14:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a9671a885769c8ec64afc5c6857ad8fa7ca5a2e'/>
<id>urn:sha1:4a9671a885769c8ec64afc5c6857ad8fa7ca5a2e</id>
<content type='text'>
commit 4e264ffd953463cd14c0720eaa9315ac052f5973 upstream.

Fix /proc/bootconfig to select double or single quotes
corrctly according to the value.

If a bootconfig value includes a double quote character,
we must use single-quotes to quote that value.

This modifies if() condition and blocks for avoiding
double-quote in value check in 2 places. Anyway, since
xbc_array_for_each_value() can handle the array which
has a single node correctly.
Thus,

if (vnode &amp;&amp; xbc_node_is_array(vnode)) {
	xbc_array_for_each_value(vnode)	/* vnode-&gt;next != NULL */
		...
} else {
	snprintf(val); /* val is an empty string if !vnode */
}

is equivalent to

if (vnode) {
	xbc_array_for_each_value(vnode)	/* vnode-&gt;next can be NULL */
		...
} else {
	snprintf("");	/* value is always empty */
}

Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Signed-off-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>proc: Use new_inode not new_inode_pseudo</title>
<updated>2020-06-17T14:42:59Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-06-12T14:42:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=476bdd28059a987b860a2c3aaade706eeac971a8'/>
<id>urn:sha1:476bdd28059a987b860a2c3aaade706eeac971a8</id>
<content type='text'>
commit ef1548adada51a2f32ed7faef50aa465e1b4c5da upstream.

Recently syzbot reported that unmounting proc when there is an ongoing
inotify watch on the root directory of proc could result in a use
after free when the watch is removed after the unmount of proc
when the watcher exits.

Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount
of proc") made it easier to unmount proc and allowed syzbot to see the
problem, but looking at the code it has been around for a long time.

Looking at the code the fsnotify watch should have been removed by
fsnotify_sb_delete in generic_shutdown_super.  Unfortunately the inode
was allocated with new_inode_pseudo instead of new_inode so the inode
was not on the sb-&gt;s_inodes list.  Which prevented
fsnotify_unmount_inodes from finding the inode and removing the watch
as well as made it so the "VFS: Busy inodes after unmount" warning
could not find the inodes to warn about them.

Make all of the inodes in proc visible to generic_shutdown_super,
and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
The only functional difference is that new_inode places the inodes
on the sb-&gt;s_inodes list.

I wrote a small test program and I can verify that without changes it
can trigger this issue, and by replacing new_inode_pseudo with
new_inode the issues goes away.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com
Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com
Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread")
Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry")
Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2020-04-25T19:25:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-25T19:25:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2768df24ec400dd4f7fa79542f797e904812053'/>
<id>urn:sha1:b2768df24ec400dd4f7fa79542f797e904812053</id>
<content type='text'>
Pull pid leak fix from Eric Biederman:
 "Oleg noticed that put_pid(thread_pid) was not getting called when proc
  was not compiled in.

  Let's get that fixed before 5.7 is released and causes problems for
  anyone"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Put thread_pid in release_task not proc_flush_pid
</content>
</entry>
<entry>
<title>proc: Put thread_pid in release_task not proc_flush_pid</title>
<updated>2020-04-24T20:49:00Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-04-24T20:41:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6ade99ec6175ab2b54c227521e181e1c3c2bfc8a'/>
<id>urn:sha1:6ade99ec6175ab2b54c227521e181e1c3c2bfc8a</id>
<content type='text'>
Oleg pointed out that in the unlikely event the kernel is compiled
with CONFIG_PROC_FS unset that release_task will now leak the pid.

Move the put_pid out of proc_flush_pid into release_task to fix this
and to guarantee I don't make that mistake again.

When possible it makes sense to keep get and put in the same function
so it can easily been seen how they pair up.

Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>vmalloc: fix remap_vmalloc_range() bounds checks</title>
<updated>2020-04-21T18:11:56Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2020-04-21T01:14:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bdebd6a2831b6fab69eb85cee74a8ba77f1a1cc2'/>
<id>urn:sha1:bdebd6a2831b6fab69eb85cee74a8ba77f1a1cc2</id>
<content type='text'>
remap_vmalloc_range() has had various issues with the bounds checks it
promises to perform ("This function checks that addr is a valid
vmalloc'ed area, and that it is big enough to cover the vma") over time,
e.g.:

 - not detecting pgoff&lt;&lt;PAGE_SHIFT overflow

 - not detecting (pgoff&lt;&lt;PAGE_SHIFT)+usize overflow

 - not checking whether addr and addr+(pgoff&lt;&lt;PAGE_SHIFT) are the same
   vmalloc allocation

 - comparing a potentially wildly out-of-bounds pointer with the end of
   the vmalloc region

In particular, since commit fc9702273e2e ("bpf: Add mmap() support for
BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
dereferences by calling mmap() on a BPF map with a size that is bigger
than the distance from the start of the BPF map to the end of the
address space.

This could theoretically be used as a kernel ASLR bypass, by using
whether mmap() with a given offset oopses or returns an error code to
perform a binary search over the possible address range.

To allow remap_vmalloc_range_partial() to verify that addr and
addr+(pgoff&lt;&lt;PAGE_SHIFT) are in the same vmalloc region, pass the offset
to remap_vmalloc_range_partial() instead of adding it to the pointer in
remap_vmalloc_range().

In remap_vmalloc_range_partial(), fix the check against
get_vm_area_size() by using size comparisons instead of pointer
comparisons, and add checks for pgoff.

Fixes: 833423143c3a ("[PATCH] mm: introduce remap_vmalloc_range()")
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: Yonghong Song &lt;yhs@fb.com&gt;
Cc: Andrii Nakryiko &lt;andriin@fb.com&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@chromium.org&gt;
Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-04-19T18:46:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-19T18:46:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e0dea57686d8a0cb8b870de0f5ccbd2e941d8b3'/>
<id>urn:sha1:3e0dea57686d8a0cb8b870de0f5ccbd2e941d8b3</id>
<content type='text'>
Pull time namespace fix from Thomas Gleixner:
 "An update for the proc interface of time namespaces: Use symbolic
  names instead of clockid numbers. The usability nuisance of numbers
  was noticed by Michael when polishing the man page"

* tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
</content>
</entry>
<entry>
<title>proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets</title>
<updated>2020-04-16T10:10:54Z</updated>
<author>
<name>Andrei Vagin</name>
<email>avagin@gmail.com</email>
</author>
<published>2020-04-11T15:40:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94d440d618467806009c8edc70b094d64e12ee5a'/>
<id>urn:sha1:94d440d618467806009c8edc70b094d64e12ee5a</id>
<content type='text'>
Michael Kerrisk suggested to replace numeric clock IDs with symbolic names.

Now the content of these files looks like this:
$ cat /proc/774/timens_offsets
monotonic      864000         0
boottime      1728000         0

For setting offsets, both representations of clocks (numeric and symbolic)
can be used.

As for compatibility, it is acceptable to change things as long as
userspace doesn't care. The format of timens_offsets files is very new and
there are no userspace tools yet which rely on this format.

But three projects crun, util-linux and criu rely on the interface of
setting time offsets and this is why it's required to continue supporting
the numeric clock IDs on write.

Fixes: 04a8682a71be ("fs/proc: Introduce /proc/pid/timens_offsets")
Suggested-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Andrei Vagin &lt;avagin@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Acked-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200411154031.642557-1-avagin@gmail.com
</content>
</entry>
<entry>
<title>proc: Handle umounts cleanly</title>
<updated>2020-04-16T04:52:29Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-04-15T17:37:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4fa3b1c417377c352208ee9f487e17cfcee32348'/>
<id>urn:sha1:4fa3b1c417377c352208ee9f487e17cfcee32348</id>
<content type='text'>
syzbot writes:
&gt; KASAN: use-after-free Read in dput (2)
&gt;
&gt; proc_fill_super: allocate dentry failed
&gt; ==================================================================
&gt; BUG: KASAN: use-after-free in fast_dput fs/dcache.c:727 [inline]
&gt; BUG: KASAN: use-after-free in dput+0x53e/0xdf0 fs/dcache.c:846
&gt; Read of size 4 at addr ffff88808a618cf0 by task syz-executor.0/8426
&gt;
&gt; CPU: 0 PID: 8426 Comm: syz-executor.0 Not tainted 5.6.0-next-20200412-syzkaller #0
&gt; Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
&gt; Call Trace:
&gt;  __dump_stack lib/dump_stack.c:77 [inline]
&gt;  dump_stack+0x188/0x20d lib/dump_stack.c:118
&gt;  print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
&gt;  __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
&gt;  kasan_report+0x33/0x50 mm/kasan/common.c:625
&gt;  fast_dput fs/dcache.c:727 [inline]
&gt;  dput+0x53e/0xdf0 fs/dcache.c:846
&gt;  proc_kill_sb+0x73/0xf0 fs/proc/root.c:195
&gt;  deactivate_locked_super+0x8c/0xf0 fs/super.c:335
&gt;  vfs_get_super+0x258/0x2d0 fs/super.c:1212
&gt;  vfs_get_tree+0x89/0x2f0 fs/super.c:1547
&gt;  do_new_mount fs/namespace.c:2813 [inline]
&gt;  do_mount+0x1306/0x1b30 fs/namespace.c:3138
&gt;  __do_sys_mount fs/namespace.c:3347 [inline]
&gt;  __se_sys_mount fs/namespace.c:3324 [inline]
&gt;  __x64_sys_mount+0x18f/0x230 fs/namespace.c:3324
&gt;  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
&gt;  entry_SYSCALL_64_after_hwframe+0x49/0xb3
&gt; RIP: 0033:0x45c889
&gt; Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 &lt;48&gt; 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
&gt; RSP: 002b:00007ffc1930ec48 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
&gt; RAX: ffffffffffffffda RBX: 0000000001324914 RCX: 000000000045c889
&gt; RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000000000000
&gt; RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
&gt; R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
&gt; R13: 0000000000000749 R14: 00000000004ca15a R15: 0000000000000013

Looking at the code now that it the internal mount of proc is no
longer used it is possible to unmount proc.   If proc is unmounted
the fields of the pid namespace that were used for filesystem
specific state are not reinitialized.

Which means that proc_self and proc_thread_self can be pointers to
already freed dentries.

The reported user after free appears to be from mounting and
unmounting proc followed by mounting proc again and using error
injection to cause the new root dentry allocation to fail.  This in
turn results in proc_kill_sb running with proc_self and
proc_thread_self still retaining their values from the previous mount
of proc.  Then calling dput on either proc_self of proc_thread_self
will result in double put.  Which KASAN sees as a use after free.

Solve this by always reinitializing the filesystem state stored
in the struct pid_namespace, when proc is unmounted.

Reported-by: syzbot+72868dd424eb66c6b95f@syzkaller.appspotmail.com
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Fixes: 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2020-04-10T19:59:56Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-10T19:59:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=87ad46e601340394cd75c1c79b19ca906f82c543'/>
<id>urn:sha1:87ad46e601340394cd75c1c79b19ca906f82c543</id>
<content type='text'>
Pull proc fix from Eric Biederman:
 "A brown paper bag slipped through my proc changes, and syzcaller
  caught it when the code ended up in your tree.

  I have opted to fix it the simplest cleanest way I know how, so there
  is no reasonable chance for the bug to repeat"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use a dedicated lock in struct pid
</content>
</entry>
<entry>
<title>proc: Use a dedicated lock in struct pid</title>
<updated>2020-04-09T17:15:35Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-04-07T14:43:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=63f818f46af9f8b3f17b9695501e8d08959feb60'/>
<id>urn:sha1:63f818f46af9f8b3f17b9695501e8d08959feb60</id>
<content type='text'>
syzbot wrote:
&gt; ========================================================
&gt; WARNING: possible irq lock inversion dependency detected
&gt; 5.6.0-syzkaller #0 Not tainted
&gt; --------------------------------------------------------
&gt; swapper/1/0 just changed the state of lock:
&gt; ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
&gt; but this lock took another, SOFTIRQ-unsafe lock in the past:
&gt;  (&amp;pid-&gt;wait_pidfd){+.+.}-{2:2}
&gt;
&gt;
&gt; and interrupts could create inverse lock ordering between them.
&gt;
&gt;
&gt; other info that might help us debug this:
&gt;  Possible interrupt unsafe locking scenario:
&gt;
&gt;        CPU0                    CPU1
&gt;        ----                    ----
&gt;   lock(&amp;pid-&gt;wait_pidfd);
&gt;                                local_irq_disable();
&gt;                                lock(tasklist_lock);
&gt;                                lock(&amp;pid-&gt;wait_pidfd);
&gt;   &lt;Interrupt&gt;
&gt;     lock(tasklist_lock);
&gt;
&gt;  *** DEADLOCK ***
&gt;
&gt; 4 locks held by swapper/1/0:

The problem is that because wait_pidfd.lock is taken under the tasklist
lock.  It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.

Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock.  That should be safe and I think the code will eventually
do that.  It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.

It is also true that my pre-merge window testing was insufficient.  So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things.  I have verified that lockdep sees all 3
paths where we take the new pid-&gt;lock and lockdep does not complain.

It is my current day dream that one day pid-&gt;lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals.  That will require taking pid-&gt;lock with irqs disabled.

Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
