<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux, branch v4.14.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-01-31T13:03:50Z</updated>
<entry>
<title>bpf: avoid false sharing of map refcount with max_entries</title>
<updated>2018-01-31T13:03:50Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-01-28T23:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ea4247ec1b7efc423cf4f75450ebf5cffab9ed8'/>
<id>urn:sha1:3ea4247ec1b7efc423cf4f75450ebf5cffab9ed8</id>
<content type='text'>
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Fix get vector affinity helper function</title>
<updated>2018-01-31T13:03:46Z</updated>
<author>
<name>Saeed Mahameed</name>
<email>saeedm@mellanox.com</email>
</author>
<published>2018-01-04T02:35:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce1e51d842baa8cf55a05a26933402e55d9b6217'/>
<id>urn:sha1:ce1e51d842baa8cf55a05a26933402e55d9b6217</id>
<content type='text'>
[ Upstream commit 05e0cc84e00c54fb152d1f4b86bc211823a83d0c ]

mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users.  To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed</title>
<updated>2018-01-31T13:03:46Z</updated>
<author>
<name>Eran Ben Elisha</name>
<email>eranbe@mellanox.com</email>
</author>
<published>2018-01-09T09:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ac1797da0f79013ef166df6a5124db267b9e1b6'/>
<id>urn:sha1:2ac1797da0f79013ef166df6a5124db267b9e1b6</id>
<content type='text'>
[ Upstream commit 8978cc921fc7fad3f4d6f91f1da01352aeeeff25 ]

There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha &lt;eranbe@mellanox.com&gt;
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "module: Add retpoline tag to VERMAGIC"</title>
<updated>2018-01-31T13:03:41Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2018-01-24T14:28:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5cc765d69c4faf7dabfa5816add84ee52bc7912b'/>
<id>urn:sha1:5cc765d69c4faf7dabfa5816add84ee52bc7912b</id>
<content type='text'>
commit 5132ede0fe8092b043dae09a7cc32b8ae7272baa upstream.

This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina &lt;jikos@kernel.org&gt;
Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()</title>
<updated>2018-01-23T18:58:21Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2018-01-19T12:49:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3abb4c11038e03c4adc701eff30c45638816eb0f'/>
<id>urn:sha1:3abb4c11038e03c4adc701eff30c45638816eb0f</id>
<content type='text'>
commit 0d665e7b109d512b7cae3ccef6e8654714887844 upstream.

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As -&gt;pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-and-tested-by: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>module: Add retpoline tag to VERMAGIC</title>
<updated>2018-01-23T18:58:14Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-01-16T20:52:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c556782901574e6a9d758fb1ef74cef54bf9a65'/>
<id>urn:sha1:8c556782901574e6a9d758fb1ef74cef54bf9a65</id>
<content type='text'>
commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12 upstream.

Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.

It doesn't handle the case when retpoline has been runtime disabled.  Even
in this case the match of the retcompile status will be enforced.  This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>delayacct: Account blkio completion on the correct task</title>
<updated>2018-01-23T18:58:13Z</updated>
<author>
<name>Josh Snyder</name>
<email>joshs@netflix.com</email>
</author>
<published>2017-12-18T16:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=36ae2e6f5c01f2c01c3224969f676b1b3b547aaf'/>
<id>urn:sha1:36ae2e6f5c01f2c01c3224969f676b1b3b547aaf</id>
<content type='text'>
commit c96f5471ce7d2aefd0dda560cc23f08ab00bc65d upstream.

Before commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

delayacct_blkio_end() was called after context-switching into the task which
completed I/O.

This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.

With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.

But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.

Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.

Signed-off-by: Josh Snyder &lt;joshs@netflix.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Brendan Gregg &lt;bgregg@netflix.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-block@vger.kernel.org
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sysfs/cpu: Add vulnerability folder</title>
<updated>2018-01-17T08:45:27Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-01-07T21:48:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5a3e4b399ebc994516f7b6bd8ae0584027e72418'/>
<id>urn:sha1:5a3e4b399ebc994516f7b6bd8ae0584027e72418</id>
<content type='text'>
commit 87590ce6e373d1a5401f6539f0c59ef92dd924a9 upstream.

As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.

Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.

Allow architectures to override the show function.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kdump: write correct address of mem_section into vmcoreinfo</title>
<updated>2018-01-17T08:45:27Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2018-01-13T00:53:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a4ae05b7bfcdd72f135099b848e1777e486efb19'/>
<id>urn:sha1:a4ae05b7bfcdd72f135099b848e1777e486efb19</id>
<content type='text'>
commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 upstream.

Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically.  In most cases, we can continue to
refer to it as 'mem_section' regardless of what it is.

But there's one exception: '&amp;mem_section' means "address of the array"
if mem_section is an array, but if mem_section is a pointer, it would
mean "address of the pointer".

We've stepped onto this in kdump code.  VMCOREINFO_SYMBOL(mem_section)
writes down address of pointer into vmcoreinfo, not array as we wanted.

Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.

Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf: prevent out-of-bounds speculation</title>
<updated>2018-01-17T08:45:25Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-01-08T01:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a5dbaf87684c81693854a87b4a6c46460fc7a731'/>
<id>urn:sha1:a5dbaf87684c81693854a87b4a6c46460fc7a731</id>
<content type='text'>
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index &amp; index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index &gt;= max_entries) {
    index &amp;= map-&gt;index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index &gt;= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2-&gt;v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
