<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs, branch v4.14.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-01-31T13:03:50Z</updated>
<entry>
<title>nfsd: auth: Fix gid sorting when rootsquash enabled</title>
<updated>2018-01-31T13:03:50Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben.hutchings@codethink.co.uk</email>
</author>
<published>2018-01-22T20:11:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=54e67ba7d20a5921cfe712cfe4bd773e75df10e0'/>
<id>urn:sha1:54e67ba7d20a5921cfe712cfe4bd773e75df10e0</id>
<content type='text'>
commit 1995266727fa8143897e89b55f5d3c79aa828420 upstream.

Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility
group_info allocators") appears to break nfsd rootsquash in a pretty
major way.

It adds a call to groups_sort() inside the loop that copies/squashes
gids, which means the valid gids are sorted along with the following
garbage.  The net result is that the highest numbered valid gids are
replaced with any lower-valued garbage gids, possibly including 0.

We should sort only once, after filling in all the gids.

Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...")
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Acked-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Wolfgang Walter &lt;linux@stwm.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>orangefs: fix deadlock; do not write i_size in read_iter</title>
<updated>2018-01-31T13:03:43Z</updated>
<author>
<name>Martin Brandenburg</name>
<email>martin@omnibond.com</email>
</author>
<published>2018-01-26T00:39:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=326efb49e153fbb9ccacaed9858bd9da91204061'/>
<id>urn:sha1:326efb49e153fbb9ccacaed9858bd9da91204061</id>
<content type='text'>
commit 6793f1c450b1533a5e9c2493490de771d38b24f9 upstream.

After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read.  It will be fetched from the server which will also
know about updates from other machines.

Fixes deadlock on 32-bit SMP.

See https://marc.info/?l=linux-fsdevel&amp;m=151268557427760&amp;w=2

Signed-off-by: Martin Brandenburg &lt;martin@omnibond.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Mike Marshall &lt;hubcap@omnibond.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix stale entries in readdir</title>
<updated>2018-01-31T13:03:42Z</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-01-23T20:17:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c7b881331f825fe97a47aed6bb4a32c542eca2d'/>
<id>urn:sha1:5c7b881331f825fe97a47aed6bb4a32c542eca2d</id>
<content type='text'>
commit e4fd493c0541d36953f7b9d3bfced67a1321792f upstream.

In fixing the readdir+pagefault deadlock I accidentally introduced a
stale entry regression in readdir.  If we get close to full for the
temporary buffer, and then skip a few delayed deletions, and then try to
add another entry that won't fit, we will emit the entries we found and
retry.  Unfortunately we delete entries from our del_list as we find
them, assuming we won't need them.  However our pos will be with
whatever our last entry was, which could be before the delayed deletions
we skipped, so the next search will add the deleted entries back into
our readdir buffer.  So instead don't delete entries we find in our
del_list so we can make sure we always find our delayed deletions.  This
is a slight perf hit for readdir with lots of pending deletions, but
hopefully this isn't a common occurrence.  If it is we can revist this
and optimize it.

Fixes: 23b5ec74943f ("btrfs: fix readdir deadlock with pagefault")
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>orangefs: initialize op on loop restart in orangefs_devreq_read</title>
<updated>2018-01-31T13:03:40Z</updated>
<author>
<name>Martin Brandenburg</name>
<email>martin@omnibond.com</email>
</author>
<published>2018-01-22T20:44:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e1166d9491a0eb7b854d6868e62a9066c7dff277'/>
<id>urn:sha1:e1166d9491a0eb7b854d6868e62a9066c7dff277</id>
<content type='text'>
commit a0ec1ded22e6a6bc41981fae22406835b006a66e upstream.

In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops.  If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN.  If the op has been given up on, the loop
is restarted via a goto.  The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core.  It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg &lt;martin@omnibond.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>orangefs: use list_for_each_entry_safe in purge_waiting_ops</title>
<updated>2018-01-31T13:03:40Z</updated>
<author>
<name>Martin Brandenburg</name>
<email>martin@omnibond.com</email>
</author>
<published>2018-01-22T20:44:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d00dacda89dca83b9899fe3ed1588b08bdd4b88'/>
<id>urn:sha1:1d00dacda89dca83b9899fe3ed1588b08bdd4b88</id>
<content type='text'>
commit 0afc0decf247f65b7aba666a76a0a68adf4bc435 upstream.

set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg &lt;martin@omnibond.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>proc: fix coredump vs read /proc/*/stat race</title>
<updated>2018-01-23T18:58:18Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-01-19T00:34:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ea5c2944328c46d17cc67879d6bee964c4de9be2'/>
<id>urn:sha1:ea5c2944328c46d17cc67879d6bee964c4de9be2</id>
<content type='text'>
commit 8bb2ee192e482c5d500df9f2b1b26a560bd3026f upstream.

do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).

Steps to reproduce:

    #include &lt;stdio.h&gt;
    #include &lt;sys/types.h&gt;
    #include &lt;sys/stat.h&gt;
    #include &lt;fcntl.h&gt;
    #include &lt;sys/time.h&gt;
    #include &lt;sys/resource.h&gt;
    #include &lt;unistd.h&gt;
    #include &lt;sys/wait.h&gt;

    int main(void)
    {
    	setrlimit(RLIMIT_CORE, &amp;(struct rlimit){});

    	while (1) {
    		char buf[64];
    		char buf2[4096];
    		pid_t pid;
    		int fd;

    		pid = fork();
    		if (pid == 0) {
    			*(volatile int *)0 = 0;
    		}

    		snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
    		fd = open(buf, O_RDONLY);
    		read(fd, buf2, sizeof(buf2));
    		close(fd);

    		waitpid(pid, NULL, 0);
    	}
    	return 0;
    }

    BUG: unable to handle kernel paging request at 0000000000003fd8
    IP: do_task_stat+0x8b4/0xaf0
    PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
    RIP: 0010:do_task_stat+0x8b4/0xaf0
    Call Trace:
     proc_single_show+0x43/0x70
     seq_read+0xe6/0x3b0
     __vfs_read+0x1e/0x120
     vfs_read+0x84/0x110
     SyS_read+0x3d/0xa0
     entry_SYSCALL_64_fastpath+0x13/0x6c
    RIP: 0033:0x7f4d7928cba0
    RSP: 002b:00007ffddb245158 EFLAGS: 00000246
    Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef &lt;48&gt; 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
    RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8
    CR2: 0000000000003fd8

John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.

Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reported-by: "Kohli, Gaurav" &lt;gkohli@codeaurora.org&gt;
Tested-by: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>pipe: avoid round_pipe_size() nr_pages overflow on 32-bit</title>
<updated>2018-01-23T18:58:14Z</updated>
<author>
<name>Joe Lawrence</name>
<email>joe.lawrence@redhat.com</email>
</author>
<published>2017-11-17T23:29:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e109607e141be9587612346b7cc56e030942042e'/>
<id>urn:sha1:e109607e141be9587612346b7cc56e030942042e</id>
<content type='text'>
commit d3f14c485867cfb2e0c48aa88c41d0ef4bf5209c upstream.

round_pipe_size() contains a right-bit-shift expression which may
overflow, which would cause undefined results in a subsequent
roundup_pow_of_two() call.

  static inline unsigned int round_pipe_size(unsigned int size)
  {
          unsigned long nr_pages;

          nr_pages = (size + PAGE_SIZE - 1) &gt;&gt; PAGE_SHIFT;
          return roundup_pow_of_two(nr_pages) &lt;&lt; PAGE_SHIFT;
  }

PAGE_SIZE is defined as (1UL &lt;&lt; PAGE_SHIFT), so:
  - 4 bytes wide on 32-bit (0 to 0xffffffff)
  - 8 bytes wide on 64-bit (0 to 0xffffffffffffffff)

That means that 32-bit round_pipe_size(), nr_pages may overflow to 0:

  size=0x00000000    nr_pages=0x0
  size=0x00000001    nr_pages=0x1
  size=0xfffff000    nr_pages=0xfffff
  size=0xfffff001    nr_pages=0x0         &lt;&lt; !
  size=0xffffffff    nr_pages=0x0         &lt;&lt; !

This is bad because roundup_pow_of_two(n) is undefined when n == 0!

64-bit is not a problem as the unsigned int size is 4 bytes wide
(similar to 32-bit) and the larger, 8 byte wide unsigned long, is
sufficient to handle the largest value of the bit shift expression:

  size=0xffffffff    nr_pages=100000

Modify round_pipe_size() to return 0 if n == 0 and updates its callers to
handle accordingly.

Link: http://lkml.kernel.org/r/1507658689-11669-3-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence &lt;joe.lawrence@redhat.com&gt;
Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Dong Jinguang &lt;dongjinguang@huawei.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86 / CPU: Always show current CPU frequency in /proc/cpuinfo</title>
<updated>2018-01-10T08:31:20Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2017-11-15T01:13:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=22af48be826c4193bba2f11112330aabb2568594'/>
<id>urn:sha1:22af48be826c4193bba2f11112330aabb2568594</id>
<content type='text'>
commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream.

After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration.  That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes</title>
<updated>2018-01-10T08:31:17Z</updated>
<author>
<name>Chris Mason</name>
<email>clm@fb.com</email>
</author>
<published>2017-12-15T19:58:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9fa21f3daa6533ab2303caad72c996e3924cf3b'/>
<id>urn:sha1:b9fa21f3daa6533ab2303caad72c996e3924cf3b</id>
<content type='text'>
commit ec35e48b286959991cdbb886f1bdeda4575c80b4 upstream.

refcounts have a generic implementation and an asm optimized one.  The
generic version has extra debugging to make sure that once a refcount
goes to zero, refcount_inc won't increase it.

The btrfs delayed inode code wasn't expecting this, and we're tripping
over the warnings when the generic refcounts are used.  We ended up with
this race:

Process A                                         Process B
                                                  btrfs_get_delayed_node()
						  spin_lock(root-&gt;inode_lock)
						  radix_tree_lookup()
__btrfs_release_delayed_node()
refcount_dec_and_test(&amp;delayed_node-&gt;refs)
our refcount is now zero
						  refcount_add(2) &lt;---
						  warning here, refcount
                                                  unchanged

spin_lock(root-&gt;inode_lock)
radix_tree_delete()

With the generic refcounts, we actually warn again when process B above
tries to release his refcount because refcount_add() turned into a
no-op.

We saw this in production on older kernels without the asm optimized
refcounts.

The fix used here is to use refcount_inc_not_zero() to detect when the
object is in the middle of being freed and return NULL.  This is almost
always the right answer anyway, since we usually end up pitching the
delayed_node if it didn't have fresh data in it.

This also changes __btrfs_release_delayed_node() to remove the extra
check for zero refcounts before radix tree deletion.
btrfs_get_delayed_node() was the only path that was allowing refcounts
to go from zero to one.

Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node")
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>userfaultfd: clear the vma-&gt;vm_userfaultfd_ctx if UFFD_EVENT_FORK fails</title>
<updated>2018-01-10T08:31:17Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2018-01-05T00:18:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=319122a71f306943a75217b3e78a9a0f8e9c1e95'/>
<id>urn:sha1:319122a71f306943a75217b3e78a9a0f8e9c1e95</id>
<content type='text'>
commit 0cbb4b4f4c44f54af268969b18d8deda63aded59 upstream.

The previous fix in commit 384632e67e08 ("userfaultfd: non-cooperative:
fix fork use after free") corrected the refcounting in case of
UFFD_EVENT_FORK failure for the fork userfault paths.

That still didn't clear the vma-&gt;vm_userfaultfd_ctx of the vmas that
were set to point to the aborted new uffd ctx earlier in
dup_userfaultfd.

Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Eric Biggers &lt;ebiggers3@gmail.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
