<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/mm, branch v4.4.159</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.159</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.159'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-09-29T10:08:52Z</updated>
<entry>
<title>mm: shmem.c: Correctly annotate new inodes for lockdep</title>
<updated>2018-09-29T10:08:52Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2018-09-20T19:22:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4da7f35b06702b1bc011270f15084a574ac76e1f'/>
<id>urn:sha1:4da7f35b06702b1bc011270f15084a574ac76e1f</id>
<content type='text'>
commit b45d71fb89ab8adfe727b9d0ee188ed58582a647 upstream.

Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

&gt; ======================================================
&gt; WARNING: possible circular locking dependency detected
&gt; 4.18.0-rc8-next-20180810+ #36 Not tainted
&gt; ------------------------------------------------------
&gt; syz-executor900/4483 is trying to acquire lock:
&gt; 00000000d2bfc8fe (&amp;sb-&gt;s_type-&gt;i_mutex_key#9){++++}, at: inode_lock
&gt; include/linux/fs.h:765 [inline]
&gt; 00000000d2bfc8fe (&amp;sb-&gt;s_type-&gt;i_mutex_key#9){++++}, at:
&gt; shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
&gt;
&gt; but task is already holding lock:
&gt; 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
&gt; drivers/staging/android/ashmem.c:448
&gt;
&gt; which lock already depends on the new lock.
&gt;
&gt; -&gt; #2 (ashmem_mutex){+.+.}:
&gt;        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
&gt;        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
&gt;        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
&gt;        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
&gt;        call_mmap include/linux/fs.h:1844 [inline]
&gt;        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
&gt;        do_mmap+0xa10/0x1220 mm/mmap.c:1535
&gt;        do_mmap_pgoff include/linux/mm.h:2298 [inline]
&gt;        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
&gt;        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
&gt;        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
&gt;        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
&gt;        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
&gt;        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
&gt;        entry_SYSCALL_64_after_hwframe+0x49/0xbe
&gt;
&gt; -&gt; #1 (&amp;mm-&gt;mmap_sem){++++}:
&gt;        __might_fault+0x155/0x1e0 mm/memory.c:4568
&gt;        _copy_to_user+0x30/0x110 lib/usercopy.c:25
&gt;        copy_to_user include/linux/uaccess.h:155 [inline]
&gt;        filldir+0x1ea/0x3a0 fs/readdir.c:196
&gt;        dir_emit_dot include/linux/fs.h:3464 [inline]
&gt;        dir_emit_dots include/linux/fs.h:3475 [inline]
&gt;        dcache_readdir+0x13a/0x620 fs/libfs.c:193
&gt;        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
&gt;        __do_sys_getdents fs/readdir.c:231 [inline]
&gt;        __se_sys_getdents fs/readdir.c:212 [inline]
&gt;        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
&gt;        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
&gt;        entry_SYSCALL_64_after_hwframe+0x49/0xbe
&gt;
&gt; -&gt; #0 (&amp;sb-&gt;s_type-&gt;i_mutex_key#9){++++}:
&gt;        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
&gt;        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
&gt;        inode_lock include/linux/fs.h:765 [inline]
&gt;        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
&gt;        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
&gt;        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
&gt;        vfs_ioctl fs/ioctl.c:46 [inline]
&gt;        file_ioctl fs/ioctl.c:501 [inline]
&gt;        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
&gt;        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
&gt;        __do_sys_ioctl fs/ioctl.c:709 [inline]
&gt;        __se_sys_ioctl fs/ioctl.c:707 [inline]
&gt;        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
&gt;        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
&gt;        entry_SYSCALL_64_after_hwframe+0x49/0xbe
&gt;
&gt; other info that might help us debug this:
&gt;
&gt; Chain exists of:
&gt;   &amp;sb-&gt;s_type-&gt;i_mutex_key#9 --&gt; &amp;mm-&gt;mmap_sem --&gt; ashmem_mutex
&gt;
&gt;  Possible unsafe locking scenario:
&gt;
&gt;        CPU0                    CPU1
&gt;        ----                    ----
&gt;   lock(ashmem_mutex);
&gt;                                lock(&amp;mm-&gt;mmap_sem);
&gt;                                lock(ashmem_mutex);
&gt;   lock(&amp;sb-&gt;s_type-&gt;i_mutex_key#9);
&gt;
&gt;  *** DEADLOCK ***
&gt;
&gt; 1 lock held by syz-executor900/4483:
&gt;  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
&gt; ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: get rid of vmacache_flush_all() entirely</title>
<updated>2018-09-19T20:49:00Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-09-13T09:57:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=88d6918401a4ecdc50fe77df3e1e77c1e49d8579'/>
<id>urn:sha1:88d6918401a4ecdc50fe77df3e1e77c1e49d8579</id>
<content type='text'>
commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream.

Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too.  It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit.  That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.

So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too.  Win-win.

[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
  also just goes away entirely with this ]

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Fixes: Commit cdbf92675fad ("mm: numa: avoid waiting on freed migrated pages")</title>
<updated>2018-09-15T07:40:40Z</updated>
<author>
<name>Chas Williams</name>
<email>chas3@att.com</email>
</author>
<published>2018-09-06T15:11:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e72977e87482759dba7181e0ec210c2db00c6124'/>
<id>urn:sha1:e72977e87482759dba7181e0ec210c2db00c6124</id>
<content type='text'>
Commit cdbf92675fad ("mm: numa: avoid waiting on freed migrated pages")
was an incomplete backport of the upstream commit.  It is necessary to
always reset page_nid before attempting any early exit.

The original commit conflicted due to lack of commit 82b0f8c39a38
("mm: join struct fault_env and vm_fault") in 4.9 so it wasn't a clean
application, and the change must have just gotten lost in the noise.

Signed-off-by: Chas Williams &lt;chas3@att.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/fadvise.c: fix signed overflow UBSAN complaint</title>
<updated>2018-09-15T07:40:38Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2018-08-17T22:46:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ca3b3df6d52aff8b42271a23d7f2218672cfbf8'/>
<id>urn:sha1:4ca3b3df6d52aff8b42271a23d7f2218672cfbf8</id>
<content type='text'>
[ Upstream commit a718e28f538441a3b6612da9ff226973376cdf0f ]

Signed integer overflow is undefined according to the C standard.  The
overflow in ksys_fadvise64_64() is deliberate, but since it is signed
overflow, UBSAN complains:

	UBSAN: Undefined behaviour in mm/fadvise.c:76:10
	signed integer overflow:
	4 + 9223372036854775805 cannot be represented in type 'long long int'

Use unsigned types to do math.  Unsigned overflow is defined so UBSAN
will not complain about it.  This patch doesn't change generated code.

[akpm@linux-foundation.org: add comment explaining the casts]
Link: http://lkml.kernel.org/r/20180629184453.7614-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Reported-by: &lt;icytxw@gmail.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/tlb: Remove tlb_remove_table() non-concurrent condition</title>
<updated>2018-09-09T18:04:34Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2018-08-22T15:30:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=70201a4e368833c15625d8dc32fd9c0286a12b58'/>
<id>urn:sha1:70201a4e368833c15625d8dc32fd9c0286a12b58</id>
<content type='text'>
commit a6f572084fbee8b30f91465f4a085d7a90901c57 upstream.

Will noted that only checking mm_users is incorrect; we should also
check mm_count in order to cover CPUs that have a lazy reference to
this mm (and could do speculative TLB operations).

If removing this turns out to be a performance issue, we can
re-instate a more complete check, but in tlb_table_flush() eliding the
call_rcu_sched().

Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code")
Reported-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/memory.c: check return value of ioremap_prot</title>
<updated>2018-09-05T07:18:36Z</updated>
<author>
<name>jie@chenjie6@huwei.com</name>
<email>jie@chenjie6@huwei.com</email>
</author>
<published>2018-08-11T00:23:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fbee7b5b8c28ed02b6d6603eef27730c148a4481'/>
<id>urn:sha1:fbee7b5b8c28ed02b6d6603eef27730c148a4481</id>
<content type='text'>
[ Upstream commit 24eee1e4c47977bdfb71d6f15f6011e7b6188d04 ]

ioremap_prot() can return NULL which could lead to an oops.

Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com
Signed-off-by: chen jie &lt;chenjie6@huawei.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: chenjie &lt;chenjie6@huawei.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>zswap: re-check zswap_is_full() after do zswap_shrink()</title>
<updated>2018-09-05T07:18:36Z</updated>
<author>
<name>Li Wang</name>
<email>liwang@redhat.com</email>
</author>
<published>2018-07-26T23:37:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0d35e0188a97a89cf92d16250e505d04707d56b3'/>
<id>urn:sha1:0d35e0188a97a89cf92d16250e505d04707d56b3</id>
<content type='text'>
[ Upstream commit 16e536ef47f567289a5699abee9ff7bb304bc12d ]

/sys/../zswap/stored_pages keeps rising in a zswap test with
"zswap.max_pool_percent=0" parameter.  But it should not compress or
store pages any more since there is no space in the compressed pool.

Reproduce steps:
  1. Boot kernel with "zswap.enabled=1"
  2. Set the max_pool_percent to 0
      # echo 0 &gt; /sys/module/zswap/parameters/max_pool_percent
  3. Do memory stress test to see if some pages have been compressed
      # stress --vm 1 --vm-bytes $mem_available"M" --timeout 60s
  4. Watching the 'stored_pages' number increasing or not

The root cause is:

  When zswap_max_pool_percent is set to 0 via kernel parameter,
  zswap_is_full() will always return true due to zswap_shrink().  But if
  the shinking is able to reclain a page successfully the code then
  proceeds to compressing/storing another page, so the value of
  stored_pages will keep changing.

To solve the issue, this patch adds a zswap_is_full() check again after
  zswap_shrink() to make sure it's now under the max_pool_percent, and to
  not compress/store if we reached the limit.

Link: http://lkml.kernel.org/r/20180530103936.17812-1-liwang@redhat.com
Signed-off-by: Li Wang &lt;liwang@redhat.com&gt;
Acked-by: Dan Streetman &lt;ddstreet@ieee.org&gt;
Cc: Seth Jennings &lt;sjenning@redhat.com&gt;
Cc: Huang Ying &lt;huang.ying.caritas@gmail.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kasan: fix shadow_size calculation error in kasan_module_alloc</title>
<updated>2018-08-24T11:26:58Z</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2018-07-04T00:02:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1acb2ad5d9d0fc66f18c74e22af3c07e41a5dbca'/>
<id>urn:sha1:1acb2ad5d9d0fc66f18c74e22af3c07e41a5dbca</id>
<content type='text'>
[ Upstream commit 1e8e18f694a52d703665012ca486826f64bac29d ]

There is a special case that the size is "(N &lt;&lt; KASAN_SHADOW_SCALE_SHIFT)
Pages plus X", the value of X is [1, KASAN_SHADOW_SCALE_SIZE-1].  The
operation "size &gt;&gt; KASAN_SHADOW_SCALE_SHIFT" will drop X, and the
roundup operation can not retrieve the missed one page.  For example:
size=0x28006, PAGE_SIZE=0x1000, KASAN_SHADOW_SCALE_SHIFT=3, we will get
shadow_size=0x5000, but actually we need 6 pages.

  shadow_size = round_up(size &gt;&gt; KASAN_SHADOW_SCALE_SHIFT, PAGE_SIZE);

This can lead to a kernel crash when kasan is enabled and the value of
mod-&gt;core_layout.size or mod-&gt;init_layout.size is like above.  Because
the shadow memory of X has not been allocated and mapped.

move_module:
  ptr = module_alloc(mod-&gt;core_layout.size);
  ...
  memset(ptr, 0, mod-&gt;core_layout.size);		//crashed

  Unable to handle kernel paging request at virtual address ffff0fffff97b000
  ......
  Call trace:
    __asan_storeN+0x174/0x1a8
    memset+0x24/0x48
    layout_and_allocate+0xcd8/0x1800
    load_module+0x190/0x23e8
    SyS_finit_module+0x148/0x180

Link: http://lkml.kernel.org/r/1529659626-12660-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Reviewed-by: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Acked-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Cc: Libin &lt;huawei.libin@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/speculation/l1tf: Limit swap file size to MAX_PA/2</title>
<updated>2018-08-15T15:42:10Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-06-13T22:48:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=685b44483f077c949bd5016fdfe734b662b74aba'/>
<id>urn:sha1:685b44483f077c949bd5016fdfe734b662b74aba</id>
<content type='text'>
commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream

For the L1TF workaround its necessary to limit the swap file size to below
MAX_PA/2, so that the higher bits of the swap offset inverted never point
to valid memory.

Add a mechanism for the architecture to override the swap file size check
in swapfile.c and add a x86 specific max swapfile check function that
enforces that limit.

The check is only enabled if the CPU is vulnerable to L1TF.

In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
with 46bit PA it is 32TB. The limit is only per individual swap file, so
it's always possible to exceed these limits with multiple swap files or
partitions.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings</title>
<updated>2018-08-15T15:42:10Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-06-13T22:48:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d71af2dbacb5611c1dcdc16fd1d343821d61bd5e'/>
<id>urn:sha1:d71af2dbacb5611c1dcdc16fd1d343821d61bd5e</id>
<content type='text'>
commit 42e4089c7890725fcd329999252dc489b72f2921 upstream

For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.

Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.

To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.

Valid page mappings are allowed because the system is then unsafe anyways.

It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.

For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().

For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.

[dwmw2: Backport to 4.9]
[groeck: Backport to 4.4]

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
