<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/mm.h, branch v5.4.170</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.170</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.170'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-06-30T12:47:53Z</updated>
<entry>
<title>mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()</title>
<updated>2021-06-30T12:47:53Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2021-06-16T01:24:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd43892152274593bfc6b42aba9c4e389e3b2506'/>
<id>urn:sha1:bd43892152274593bfc6b42aba9c4e389e3b2506</id>
<content type='text'>
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ]

There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted.  This commit fixes that, but not
in the way I hoped.

The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.

However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap.  Set that aside.

The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end.  Nice.  But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage.  It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too).  Set that aside.

Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.

This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details.  Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case.  Not pretty, but
safe.

Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page-&gt;mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not.  Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.

Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes: fc127da085c2 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jue Wang &lt;juew@google.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Wang Yugui &lt;wangyugui@e16-tech.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Note on stable backport: fixed up call to truncate_cleanup_page()
in truncate_inode_pages_range().  Use hpage_nr_pages() in
unmap_mapping_page().

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: fix F_SEAL_FUTURE_WRITE</title>
<updated>2021-05-19T08:08:29Z</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2021-05-15T00:27:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f77aa56ad9894468a8eb43279d23c000aabb1cc5'/>
<id>urn:sha1:f77aa56ad9894468a8eb43279d23c000aabb1cc5</id>
<content type='text'>
commit 22247efd822e6d263f3c8bd327f3f769aea9b1d9 upstream.

Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.

Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).

Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages.  Patch 2 addresses that.

After this series applied, "memfd_test hugetlbfs" should start to pass.

This patch (of 2):

F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.

$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)

I think it's probably because no one is really running the hugetlbfs test.

Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap().  Generalize a helper for that.

Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Reported-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kasan: fix per-page tags for non-page_alloc pages</title>
<updated>2021-03-30T12:35:23Z</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@google.com</email>
</author>
<published>2021-03-25T04:37:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4748b6d56efeab0f3347c698af40cd50db50e937'/>
<id>urn:sha1:4748b6d56efeab0f3347c698af40cd50db50e937</id>
<content type='text'>
commit cf10bd4c4aff8dd64d1aa7f2a529d0c672bc16af upstream.

To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page-&gt;flags.

Currently, the default tag value stored in page-&gt;flags is 0x00.
Therefore, page_address() returns a 0x00ffff...  address for pages that
were not allocated via page_alloc.

This might cause problems.  A particular case we encountered is a
conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags).  This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.

This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.

With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.

This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.

Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Peter Collingbourne &lt;pcc@google.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: provide a saner PTE walking API for modules</title>
<updated>2021-02-26T09:10:28Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2021-02-05T10:07:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3f9fbe70316407a6f7322e2bb6ac5fb272dbbe79'/>
<id>urn:sha1:3f9fbe70316407a6f7322e2bb6ac5fb272dbbe79</id>
<content type='text'>
commit 9fd6dad1261a541b3f5fa7dc5b152222306e6702 upstream.

Currently, the follow_pfn function is exported for modules but
follow_pte is not.  However, follow_pfn is very easy to misuse,
because it does not provide protections (so most of its callers
assume the page is writable!) and because it returns after having
already unlocked the page table lock.

Provide instead a simplified version of follow_pte that does
not have the pmdpp and range arguments.  The older version
survives as follow_invalidate_pte() for use by fs/dax.c.

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: simplify follow_pte{,pmd}</title>
<updated>2021-02-26T09:10:27Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-12-16T04:47:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=78c7b242573a9099cb273c9a4093fdd3cea4759d'/>
<id>urn:sha1:78c7b242573a9099cb273c9a4093fdd3cea4759d</id>
<content type='text'>
commit ff5c19ed4b087073cea38ff0edc80c23d7256943 upstream.

Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single
follow_pte function and just pass two additional NULL arguments for the
two previous follow_pte callers.

[sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"]
  Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au

Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: always have io_remap_pfn_range() set pgprot_decrypted()</title>
<updated>2020-11-10T11:37:27Z</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2020-11-02T01:08:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2286457bd838e78f6e12a6f4a0d99aa64dc2cc0'/>
<id>urn:sha1:d2286457bd838e78f6e12a6f4a0d99aa64dc2cc0</id>
<content type='text'>
commit f8f6ae5d077a9bdaf5cbf2ac960a5d1a04b47482 upstream.

The purpose of io_remap_pfn_range() is to map IO memory, such as a
memory mapped IO exposed through a PCI BAR.  IO devices do not
understand encryption, so this memory must always be decrypted.
Automatically call pgprot_decrypted() as part of the generic
implementation.

This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail.
The CPU will encrypt access to those BAR pages instead of passing
unencrypted IO directly to the device.

Places not mapping IO should use remap_pfn_range().

Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brijesh Singh &lt;brijesh.singh@amd.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: "Dave Young" &lt;dyoung@redhat.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Larry Woodman &lt;lwoodman@redhat.com&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Toshimitsu Kani &lt;toshi.kani@hpe.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/0-v1-025d64bdf6c4+e-amd_sme_fix_jgg@nvidia.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: replace memmap_context by meminit_context</title>
<updated>2020-10-07T06:01:29Z</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2020-09-26T04:19:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42b7153dd6a64b61141771f4ae0b1b1131321720'/>
<id>urn:sha1:42b7153dd6a64b61141771f4ae0b1b1131321720</id>
<content type='text'>
commit c1d0da83358a2316d9be7f229f26126dbaa07468 upstream.

Patch series "mm: fix memory to node bad links in sysfs", v3.

Sometimes, firmware may expose interleaved memory layout like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

In that case, we can see memory blocks assigned to multiple nodes in
sysfs:

  $ ls -l /sys/devices/system/memory/memory21
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -&gt; ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -&gt; ../../node/node2
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 online
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index
  drwxr-xr-x 2 root root     0 Aug 24 05:27 power
  -r--r--r-- 1 root root 65536 Aug 24 05:27 removable
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 state
  lrwxrwxrwx 1 root root     0 Aug 24 05:25 subsystem -&gt; ../../../../bus/memory
  -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent
  -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones

The same applies in the node's directory with a memory21 link in both
the node1 and node2's directory.

This is wrong but doesn't prevent the system to run.  However when
later, one of these memory blocks is hot-unplugged and then hot-plugged,
the system is detecting an inconsistency in the sysfs layout and a
BUG_ON() is raised:

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This has been seen on PowerPC LPAR.

The root cause of this issue is that when node's memory is registered,
the range used can overlap another node's range, thus the memory block
is registered to multiple nodes in sysfs.

There are two issues here:

 (a) The sysfs memory and node's layouts are broken due to these
     multiple links

 (b) The link errors in link_mem_sections() should not lead to a system
     panic.

To address (a) register_mem_sect_under_node should not rely on the
system state to detect whether the link operation is triggered by a hot
plug operation or not.  This is addressed by the patches 1 and 2 of this
series.

Issue (b) will be addressed separately.

This patch (of 2):

The memmap_context enum is used to detect whether a memory operation is
due to a hot-add operation or happening at boot time.

Make it general to the hotplug operation and rename it as
meminit_context.

There is no functional change introduced by this patch

Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J . Wysocki" &lt;rafael@kernel.org&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com
Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add kvfree_sensitive() for freeing sensitive data objects</title>
<updated>2020-06-17T14:40:23Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-06-04T23:48:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b11ec4ae5573d49d2c6cf37796f82f409b9def8'/>
<id>urn:sha1:0b11ec4ae5573d49d2c6cf37796f82f409b9def8</id>
<content type='text'>
[ Upstream commit d4eaa2837851db2bfed572898bfc17f9a9f9151e ]

For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it.  Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away.  To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc().  The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jarkko Sakkinen &lt;jarkko.sakkinen@linux.intel.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "Serge E. Hallyn" &lt;serge@hallyn.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Uladzislau Rezki &lt;urezki@gmail.com&gt;
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()</title>
<updated>2020-06-03T06:21:26Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2020-05-28T05:20:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=de482c445b0fa127901f497b4d68f0c9fbf449d5'/>
<id>urn:sha1:de482c445b0fa127901f497b4d68f0c9fbf449d5</id>
<content type='text'>
[ Upstream commit 6988f31d558aa8c744464a7f6d91d34ada48ad12 ]

Replace superfluous VM_BUG_ON() with comment about correct usage.

Technically reverts commit 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to
page_mapcount()"), but context lines have changed.

Function isolate_migratepages_block() runs some checks out of lru_lock
when choose pages for migration.  After checking PageLRU() it checks
extra page references by comparing page_count() and page_mapcount().
Between these two checks page could be removed from lru, freed and taken
by slab.

As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount().
Race window is tiny.  For certain workload this happens around once a
year.

    page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0
    flags: 0x500000000008100(slab|head)
    raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180
    raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(PageSlab(page))
    ------------[ cut here ]------------
    kernel BUG at ./include/linux/mm.h:628!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G        W         4.19.109-27 #1
    Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019
    RIP: 0010:isolate_migratepages_block+0x986/0x9b0

The code in isolate_migratepages_block() was added in commit
119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages") before
adding VM_BUG_ON into page_mapcount().

This race has been predicted in 2015 by Vlastimil Babka (see link
below).

[akpm@linux-foundation.org: comment tweaks, per Hugh]
Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()")
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz
Link: https://lore.kernel.org/lkml/557710E1.6060103@suse.cz/
Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1)
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled</title>
<updated>2020-03-12T12:00:19Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2020-03-06T06:28:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d467fbc93cdbb851427d7553876a67974e4a419a'/>
<id>urn:sha1:d467fbc93cdbb851427d7553876a67974e4a419a</id>
<content type='text'>
commit c87cbc1f007c4b46165f05ceca04e1973cda0b9c upstream.

Commit cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
fixed memory hotplug with debug_pagealloc enabled, where onlining a page
goes through page freeing, which removes the direct mapping.  Some arches
don't like when the page is not mapped in the first place, so
generic_online_page() maps it first.  This is somewhat wasteful, but
better than special casing page freeing fast paths.

The commit however missed that DEBUG_PAGEALLOC configured doesn't mean
it's actually enabled.  One has to test debug_pagealloc_enabled() since
031bc5743f15 ("mm/debug-pagealloc: make debug-pagealloc boottime
configurable"), or alternatively debug_pagealloc_enabled_static() since
8e57f8acbbd1 ("mm, debug_pagealloc: don't rely on static keys too early"),
but this is not done.

As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled
will crash:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased)
Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100
0000000000000001 0000000000000000 0000000000000002 0000000000000100
0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000
000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20
Krnl Code: 0000001ecd281b8e: ec17ffff00d8 ahik %r1,%r7,-1
0000001ecd281b94: ec111dbc0355 risbg %r1,%r1,29,188,3
&gt;0000001ecd281b9e: 94fb5006 ni 6(%r5),251
0000001ecd281ba2: 41505008 la %r5,8(%r5)
0000001ecd281ba6: ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e
0000001ecd281bac: 1a07 ar %r0,%r7
0000001ecd281bae: ec03ff584076 crj %r0,%r3,4,1ecd281a5e
Call Trace:
[&lt;0000001ecd281b9e&gt;] __kernel_map_pages+0x166/0x188
[&lt;0000001ecd4d9516&gt;] online_pages_range+0xf6/0x128
[&lt;0000001ecd2a8186&gt;] walk_system_ram_range+0x7e/0xd8
[&lt;0000001ecda28aae&gt;] online_pages+0x2fe/0x3f0
[&lt;0000001ecd7d02a6&gt;] memory_subsys_online+0x8e/0xc0
[&lt;0000001ecd7add42&gt;] device_online+0x5a/0xc8
[&lt;0000001ecd7d0430&gt;] state_store+0x88/0x118
[&lt;0000001ecd5b9f62&gt;] kernfs_fop_write+0xc2/0x200
[&lt;0000001ecd5064b6&gt;] vfs_write+0x176/0x1e0
[&lt;0000001ecd50676a&gt;] ksys_write+0xa2/0x100
[&lt;0000001ecda315d4&gt;] system_call+0xd8/0x2c8

Fix this by checking debug_pagealloc_enabled_static() before calling
kernel_map_pages(). Backports for kernel before 5.5 should use
debug_pagealloc_enabled() instead. Also add comments.

Fixes: cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
Reported-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
