<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include, branch v4.14.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-01-31T13:03:50Z</updated>
<entry>
<title>bpf: avoid false sharing of map refcount with max_entries</title>
<updated>2018-01-31T13:03:50Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-01-28T23:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ea4247ec1b7efc423cf4f75450ebf5cffab9ed8'/>
<id>urn:sha1:3ea4247ec1b7efc423cf4f75450ebf5cffab9ed8</id>
<content type='text'>
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Fix get vector affinity helper function</title>
<updated>2018-01-31T13:03:46Z</updated>
<author>
<name>Saeed Mahameed</name>
<email>saeedm@mellanox.com</email>
</author>
<published>2018-01-04T02:35:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce1e51d842baa8cf55a05a26933402e55d9b6217'/>
<id>urn:sha1:ce1e51d842baa8cf55a05a26933402e55d9b6217</id>
<content type='text'>
[ Upstream commit 05e0cc84e00c54fb152d1f4b86bc211823a83d0c ]

mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users.  To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed</title>
<updated>2018-01-31T13:03:46Z</updated>
<author>
<name>Eran Ben Elisha</name>
<email>eranbe@mellanox.com</email>
</author>
<published>2018-01-09T09:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ac1797da0f79013ef166df6a5124db267b9e1b6'/>
<id>urn:sha1:2ac1797da0f79013ef166df6a5124db267b9e1b6</id>
<content type='text'>
[ Upstream commit 8978cc921fc7fad3f4d6f91f1da01352aeeeff25 ]

There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha &lt;eranbe@mellanox.com&gt;
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/tls: Fix inverted error codes to avoid endless loop</title>
<updated>2018-01-31T13:03:45Z</updated>
<author>
<name>r.hering@avm.de</name>
<email>r.hering@avm.de</email>
</author>
<published>2018-01-12T14:42:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3048a12f3eccc00d62db373df4cd50b1218f6f1'/>
<id>urn:sha1:d3048a12f3eccc00d62db373df4cd50b1218f6f1</id>
<content type='text'>
[ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]

sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().

Signed-off-by: Robert Hering &lt;r.hering@avm.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: tcp: close sock if net namespace is exiting</title>
<updated>2018-01-31T13:03:45Z</updated>
<author>
<name>Dan Streetman</name>
<email>ddstreet@ieee.org</email>
</author>
<published>2018-01-18T21:14:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=32e57f8c557faed0cd976abfb800a2f363e6972a'/>
<id>urn:sha1:32e57f8c557faed0cd976abfb800a2f363e6972a</id>
<content type='text'>
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman &lt;ddstreet@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL</title>
<updated>2018-01-31T13:03:44Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben.hutchings@codethink.co.uk</email>
</author>
<published>2018-01-22T20:06:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2295b3dd543f9a5a1834e4265d506a5bc0819983'/>
<id>urn:sha1:2295b3dd543f9a5a1834e4265d506a5bc0819983</id>
<content type='text'>
[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY</title>
<updated>2018-01-31T13:03:44Z</updated>
<author>
<name>Jim Westfall</name>
<email>jwestfall@surrealistic.net</email>
</author>
<published>2018-01-14T12:18:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42d68bf2a42381642ea5ae460c6a5d86a56213f0'/>
<id>urn:sha1:42d68bf2a42381642ea5ae460c6a5d86a56213f0</id>
<content type='text'>
[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall &lt;jwestfall@surrealistic.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "module: Add retpoline tag to VERMAGIC"</title>
<updated>2018-01-31T13:03:41Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2018-01-24T14:28:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5cc765d69c4faf7dabfa5816add84ee52bc7912b'/>
<id>urn:sha1:5cc765d69c4faf7dabfa5816add84ee52bc7912b</id>
<content type='text'>
commit 5132ede0fe8092b043dae09a7cc32b8ae7272baa upstream.

This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina &lt;jikos@kernel.org&gt;
Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()</title>
<updated>2018-01-23T18:58:21Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2018-01-19T12:49:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3abb4c11038e03c4adc701eff30c45638816eb0f'/>
<id>urn:sha1:3abb4c11038e03c4adc701eff30c45638816eb0f</id>
<content type='text'>
commit 0d665e7b109d512b7cae3ccef6e8654714887844 upstream.

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As -&gt;pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-and-tested-by: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>module: Add retpoline tag to VERMAGIC</title>
<updated>2018-01-23T18:58:14Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-01-16T20:52:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c556782901574e6a9d758fb1ef74cef54bf9a65'/>
<id>urn:sha1:8c556782901574e6a9d758fb1ef74cef54bf9a65</id>
<content type='text'>
commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12 upstream.

Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.

It doesn't handle the case when retpoline has been runtime disabled.  Even
in this case the match of the retcompile status will be enforced.  This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
