<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include, branch v4.4.117</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.117</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.117'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-02-22T14:45:01Z</updated>
<entry>
<title>x86: fix build warnign with 32-bit PAE</title>
<updated>2018-02-22T14:45:01Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-02-15T15:16:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b35dcb5e048cde1a68603d5ad2d8ccaf3fb1e4e'/>
<id>urn:sha1:4b35dcb5e048cde1a68603d5ad2d8ccaf3fb1e4e</id>
<content type='text'>
I ran into a 4.9 build warning in randconfig testing, starting with the
KAISER patches:

arch/x86/kernel/ldt.c: In function 'alloc_ldt_struct':
arch/x86/include/asm/pgtable_types.h:208:24: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
 #define __PAGE_KERNEL  (__PAGE_KERNEL_EXEC | _PAGE_NX)
                        ^
arch/x86/kernel/ldt.c:81:6: note: in expansion of macro '__PAGE_KERNEL'
      __PAGE_KERNEL);
      ^~~~~~~~~~~~~

I originally ran into this last year when the patches were part of linux-next,
and tried to work around it by using the proper 'pteval_t' types consistently,
but that caused additional problems.

This takes a much simpler approach, and makes the argument type of the dummy
helper always 64-bit, which is wide enough for any page table layout and
won't hurt since this call is just an empty stub anyway.

Fixes: 8f0baadf2bea ("kaiser: merged update")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>crypto: poly1305 - remove -&gt;setkey() method</title>
<updated>2018-02-16T19:09:43Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2018-01-03T19:16:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=de1ca9ef43de77e507752182c1b5050851a1aa58'/>
<id>urn:sha1:de1ca9ef43de77e507752182c1b5050851a1aa58</id>
<content type='text'>
commit a16e772e664b9a261424107784804cffc8894977 upstream.

Since Poly1305 requires a nonce per invocation, the Linux kernel
implementations of Poly1305 don't use the crypto API's keying mechanism
and instead expect the key and nonce as the first 32 bytes of the data.
But -&gt;setkey() is still defined as a stub returning an error code.  This
prevents Poly1305 from being used through AF_ALG and will also break it
completely once we start enforcing that all crypto API users (not just
AF_ALG) call -&gt;setkey() if present.

Fix it by removing crypto_poly1305_setkey(), leaving -&gt;setkey as NULL.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>crypto: hash - introduce crypto_hash_alg_has_setkey()</title>
<updated>2018-02-16T19:09:42Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2018-01-03T19:16:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c311cf44ca749f114dc0f28a7e288f358ac595c8'/>
<id>urn:sha1:c311cf44ca749f114dc0f28a7e288f358ac595c8</id>
<content type='text'>
commit cd6ed77ad5d223dc6299fb58f62e0f5267f7e2ba upstream.

Templates that use an shash spawn can use crypto_shash_alg_has_setkey()
to determine whether the underlying algorithm requires a key or not.
But there was no corresponding function for ahash spawns.  Add it.

Note that the new function actually has to support both shash and ahash
algorithms, since the ahash API can be used with either.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mtd: cfi: convert inline functions to macros</title>
<updated>2018-02-16T19:09:41Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2017-10-11T13:54:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b4b3ad0c8c90b4a9fe1dc9ecca9211a72e580976'/>
<id>urn:sha1:b4b3ad0c8c90b4a9fe1dc9ecca9211a72e580976</id>
<content type='text'>
commit 9e343e87d2c4c707ef8fae2844864d4dde3a2d13 upstream.

The map_word_() functions, dating back to linux-2.6.8, try to perform
bitwise operations on a 'map_word' structure. This may have worked
with compilers that were current then (gcc-3.4 or earlier), but end
up being rather inefficient on any version I could try now (gcc-4.4 or
higher). Specifically we hit a problem analyzed in gcc PR81715 where we
fail to reuse the stack space for local variables.

This can be seen immediately in the stack consumption for
cfi_staa_erase_varsize() and other functions that (with CONFIG_KASAN)
can be up to 2200 bytes. Changing the inline functions into macros brings
this down to 1280 bytes.  Without KASAN, the same problem exists, but
the stack consumption is lower to start with, my patch shrinks it from
920 to 496 bytes on with arm-linux-gnueabi-gcc-5.4, and saves around
1KB in .text size for cfi_cmdset_0020.c, as it avoids copying map_word
structures for each call to one of these helpers.

With the latest gcc-8 snapshot, the problem is fixed in upstream gcc,
but nobody uses that yet, so we should still work around it in mainline
kernels and probably backport the workaround to stable kernels as well.
We had a couple of other functions that suffered from the same gcc bug,
and all of those had a simpler workaround involving dummy variables
in the inline function. Unfortunately that did not work here, the
macro hack was the best I could come up with.

It would also be helpful to have someone to a little performance testing
on the patch, to see how much it helps in terms of CPU utilitzation.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Richard Weinberger &lt;richard@nod.at&gt;
Signed-off-by: Boris Brezillon &lt;boris.brezillon@free-electrons.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>netfilter: nf_queue: Make the queue_handler pernet</title>
<updated>2018-02-16T19:09:40Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-05-14T02:18:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cb92c8fb3bda8fe5c073603c7a5b5f5936cc4882'/>
<id>urn:sha1:cb92c8fb3bda8fe5c073603c7a5b5f5936cc4882</id>
<content type='text'>
commit dc3ee32e96d74dd6c80eed63af5065cb75899299 upstream.

Florian Weber reported:
&gt; Under full load (unshare() in loop -&gt; OOM conditions) we can
&gt; get kernel panic:
&gt;
&gt; BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
&gt; IP: [&lt;ffffffff81476c85&gt;] nfqnl_nf_hook_drop+0x35/0x70
&gt; [..]
&gt; task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
&gt; RIP: 0010:[&lt;ffffffff81476c85&gt;]  [&lt;ffffffff81476c85&gt;] nfqnl_nf_hook_drop+0x35/0x70
&gt; RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
&gt; RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
&gt; [..]
&gt; Call Trace:
&gt;  [&lt;ffffffff81474d98&gt;] nf_queue_nf_hook_drop+0x18/0x20
&gt;  [&lt;ffffffff814738eb&gt;] nf_unregister_net_hook+0xdb/0x150
&gt;  [&lt;ffffffff8147398f&gt;] netfilter_net_exit+0x2f/0x60
&gt;  [&lt;ffffffff8141b088&gt;] ops_exit_list.isra.4+0x38/0x60
&gt;  [&lt;ffffffff8141b652&gt;] setup_net+0xc2/0x120
&gt;  [&lt;ffffffff8141bd09&gt;] copy_net_ns+0x79/0x120
&gt;  [&lt;ffffffff8106965b&gt;] create_new_namespaces+0x11b/0x1e0
&gt;  [&lt;ffffffff810698a7&gt;] unshare_nsproxy_namespaces+0x57/0xa0
&gt;  [&lt;ffffffff8104baa2&gt;] SyS_unshare+0x1b2/0x340
&gt;  [&lt;ffffffff81608276&gt;] entry_SYSCALL_64_fastpath+0x1e/0xa8
&gt; Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 &lt;49&gt; 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
&gt;

The simple fix for this requires a new pernet variable for struct
nf_queue that indicates when it is safe to use the dynamically
allocated nf_queue state.

As we need a variable anyway make nf_register_queue_handler and
nf_unregister_queue_handler pernet.  This allows the existing logic of
when it is safe to use the state from the nfnetlink_queue module to be
reused with no changes except for making it per net.

The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
function nfnl_queue_net_exit_batch so that the worst case of having a
syncrhonize_rcu in the pernet exit path is not experienced in batch
mode.

Reported-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Eric Biggers &lt;ebiggers3@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>don't put symlink bodies in pagecache into highmem</title>
<updated>2018-02-16T19:09:38Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-11-17T06:07:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=076e4ab3279eb3ddb206de44d04df7aeb2428e09'/>
<id>urn:sha1:076e4ab3279eb3ddb206de44d04df7aeb2428e09</id>
<content type='text'>
commit 21fc61c73c3903c4c312d0802da01ec2b323d174 upstream.

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jin Qian &lt;jinqian@google.com&gt;
Signed-off-by: Jin Qian &lt;jinqian@android.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf: avoid false sharing of map refcount with max_entries</title>
<updated>2018-02-03T16:04:24Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-01-30T02:37:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96d9b2338bed553c37f759127d8d18c857449ceb'/>
<id>urn:sha1:96d9b2338bed553c37f759127d8d18c857449ceb</id>
<content type='text'>
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: tcp: close sock if net namespace is exiting</title>
<updated>2018-01-31T11:06:14Z</updated>
<author>
<name>Dan Streetman</name>
<email>ddstreet@ieee.org</email>
</author>
<published>2018-01-18T21:14:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=edaafa805e0f9d09560a4892790b8e19cab8bf09'/>
<id>urn:sha1:edaafa805e0f9d09560a4892790b8e19cab8bf09</id>
<content type='text'>
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman &lt;ddstreet@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY</title>
<updated>2018-01-31T11:06:14Z</updated>
<author>
<name>Jim Westfall</name>
<email>jwestfall@surrealistic.net</email>
</author>
<published>2018-01-14T12:18:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cab8451486a6fd9ba4ce798d41352f45abb44fae'/>
<id>urn:sha1:cab8451486a6fd9ba4ce798d41352f45abb44fae</id>
<content type='text'>
[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall &lt;jwestfall@surrealistic.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: __tcp_hdrlen() helper</title>
<updated>2018-01-31T11:06:13Z</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bed42ef5ebbfc046e0440f7e0f941cc07f8ca1e8'/>
<id>urn:sha1:bed42ef5ebbfc046e0440f7e0f941cc07f8ca1e8</id>
<content type='text'>
commit d9b3fca27385eafe61c3ca6feab6cb1e7dc77482 upstream.

tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
