<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/mm, branch v5.15.76</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.76</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.76'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-10-29T08:12:54Z</updated>
<entry>
<title>mm,hugetlb: take hugetlb_lock before decrementing h-&gt;resv_huge_pages</title>
<updated>2022-10-29T08:12:54Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@surriel.com</email>
</author>
<published>2022-10-18T00:25:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=112a005d1ded04a4b41b6d01833cc0bda90625cc'/>
<id>urn:sha1:112a005d1ded04a4b41b6d01833cc0bda90625cc</id>
<content type='text'>
commit 12df140f0bdfae5dcfc81800970dd7f6f632e00c upstream.

The h-&gt;*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.

This could lead to a corrupted value of h-&gt;resv_huge_pages, which we have
observed on our systems.

Take the hugetlb_lock before decrementing h-&gt;resv_huge_pages to avoid a
potential race.

Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
Signed-off-by: Rik van Riel &lt;riel@surriel.com&gt;
Reviewed-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Glen McCready &lt;gkmccready@meta.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: hugetlb: fix UAF in hugetlb_handle_userfault</title>
<updated>2022-10-26T10:35:53Z</updated>
<author>
<name>Liu Shixin</name>
<email>liushixin2@huawei.com</email>
</author>
<published>2022-09-23T04:21:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0db2efb3bff879566f05341d94c3de00ac95c4cc'/>
<id>urn:sha1:0db2efb3bff879566f05341d94c3de00ac95c4cc</id>
<content type='text'>
commit 958f32ce832ba781ac20e11bb2d12a9352ea28fc upstream.

The vma_lock and hugetlb_fault_mutex are dropped before handling userfault
and reacquire them again after handle_userfault(), but reacquire the
vma_lock could lead to UAF[1,2] due to the following race,

hugetlb_fault
  hugetlb_no_page
    /*unlock vma_lock */
    hugetlb_handle_userfault
      handle_userfault
        /* unlock mm-&gt;mmap_lock*/
                                           vm_mmap_pgoff
                                             do_mmap
                                               mmap_region
                                                 munmap_vma_range
                                                   /* clean old vma */
        /* lock vma_lock again  &lt;--- UAF */
    /* unlock vma_lock */

Since the vma_lock will unlock immediately after
hugetlb_handle_userfault(), let's drop the unneeded lock and unlock in
hugetlb_handle_userfault() to fix the issue.

[1] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/
[2] https://lore.kernel.org/linux-mm/20220921014457.1668-1-liuzixian4@huawei.com/
Link: https://lkml.kernel.org/r/20220923042113.137273-1-liushixin2@huawei.com
Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
Signed-off-by: Liu Shixin &lt;liushixin2@huawei.com&gt;
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Reported-by: syzbot+193f9cee8638750b23cf@syzkaller.appspotmail.com
Reported-by: Liu Zixian &lt;liuzixian4@huawei.com&gt;
Reviewed-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.14+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/mmap: undo -&gt;mmap() when arch_validate_flags() fails</title>
<updated>2022-10-26T10:34:24Z</updated>
<author>
<name>Carlos Llamas</name>
<email>cmllamas@google.com</email>
</author>
<published>2022-09-30T00:38:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=43bed0a13a5cdbb314d14f28c2bf2c60eb4e6e1e'/>
<id>urn:sha1:43bed0a13a5cdbb314d14f28c2bf2c60eb4e6e1e</id>
<content type='text'>
commit deb0f6562884b5b4beb883d73e66a7d3a1b96d99 upstream.

Commit c462ac288f2c ("mm: Introduce arch_validate_flags()") added a late
check in mmap_region() to let architectures validate vm_flags.  The check
needs to happen after calling -&gt;mmap() as the flags can potentially be
modified during this callback.

If arch_validate_flags() check fails we unmap and free the vma.  However,
the error path fails to undo the -&gt;mmap() call that previously succeeded
and depending on the specific -&gt;mmap() implementation this translates to
reference increments, memory allocations and other operations what will
not be cleaned up.

There are several places (mainly device drivers) where this is an issue.
However, one specific example is bpf_map_mmap() which keeps count of the
mappings in map-&gt;writecnt.  The count is incremented on -&gt;mmap() and then
decremented on vm_ops-&gt;close().  When arch_validate_flags() fails this
count is off since bpf_map_mmap_close() is never called.

One can reproduce this issue in arm64 devices with MTE support.  Here the
vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
previously.  From userspace then is enough to pass the PROT_MTE flag to
mmap() syscall to trigger the arch_validate_flags() failure.

The following program reproduces this issue:

  #include &lt;stdio.h&gt;
  #include &lt;unistd.h&gt;
  #include &lt;linux/unistd.h&gt;
  #include &lt;linux/bpf.h&gt;
  #include &lt;sys/mman.h&gt;

  int main(void)
  {
	union bpf_attr attr = {
		.map_type = BPF_MAP_TYPE_ARRAY,
		.key_size = sizeof(int),
		.value_size = sizeof(long long),
		.max_entries = 256,
		.map_flags = BPF_F_MMAPABLE,
	};
	int fd;

	fd = syscall(__NR_bpf, BPF_MAP_CREATE, &amp;attr, sizeof(attr));
	mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);

	return 0;
  }

By manually adding some log statements to the vm_ops callbacks we can
confirm that when passing PROT_MTE to mmap() the map-&gt;writecnt is off upon
-&gt;release():

With PROT_MTE flag:
  root@debian:~# ./bpf-test
  [  111.263874] bpf_map_write_active_inc: map=9 writecnt=1
  [  111.288763] bpf_map_release: map=9 writecnt=1

Without PROT_MTE flag:
  root@debian:~# ./bpf-test
  [  157.816912] bpf_map_write_active_inc: map=10 writecnt=1
  [  157.830442] bpf_map_write_active_dec: map=10 writecnt=0
  [  157.832396] bpf_map_release: map=10 writecnt=0

This patch fixes the above issue by calling vm_ops-&gt;close() when the
arch_validate_flags() check fails, after this we can proceed to unmap and
free the vma on the error path.

Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()")
Signed-off-by: Carlos Llamas &lt;cmllamas@google.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Reviewed-by: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/damon: validate if the pmd entry is present before accessing</title>
<updated>2022-10-26T10:34:24Z</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2022-08-18T07:37:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2b0072d33eb69f950d1fac620db49259968346a6'/>
<id>urn:sha1:2b0072d33eb69f950d1fac620db49259968346a6</id>
<content type='text'>
commit c8b9aff419303e4d4219b5ff64b1c7e062dee48e upstream.

pmd_huge() is used to validate if the pmd entry is mapped by a huge page,
also including the case of non-present (migration or hwpoisoned) pmd entry
on arm64 or x86 architectures.  This means that pmd_pfn() can not get the
correct pfn number for a non-present pmd entry, which will cause
damon_get_page() to get an incorrect page struct (also may be NULL by
pfn_to_online_page()), making the access statistics incorrect.

This means that the DAMON may make incorrect decision according to the
incorrect statistics, for example, DAMON may can not reclaim cold page
in time due to this cold page was regarded as accessed mistakenly if
DAMOS_PAGEOUT operation is specified.

Moreover it does not make sense that we still waste time to get the page
of the non-present entry.  Just treat it as not-accessed and skip it,
which maintains consistency with non-present pte level entries.

So add pmd entry present validation to fix the above issues.

Link: https://lkml.kernel.org/r/58b1d1f5fbda7db49ca886d9ef6783e3dcbbbc98.1660805030.git.baolin.wang@linux.alibaba.com
Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt;
Reviewed-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()</title>
<updated>2022-10-12T07:53:28Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>naoya.horiguchi@nec.com</email>
</author>
<published>2022-09-08T04:11:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0fa249414a6fa863383834624bd030fb0fc7fa23'/>
<id>urn:sha1:0fa249414a6fa863383834624bd030fb0fc7fa23</id>
<content type='text'>
[ Upstream commit 2b7aa91ba0e86b8643f5d3c83874c80599c731d7 ]

NULL pointer dereference is triggered when calling thp split via debugfs
on the system with offlined memory blocks.  With debug option enabled, the
following kernel messages are printed out:

  page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
  flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
  raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: unmovable page
  page:000000007d7ab72e is uninitialized and poisoned
  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
  ------------[ cut here ]------------
  kernel BUG at include/linux/mm.h:1248!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
  ...
  RIP: 0010:split_huge_pages_write+0xcf4/0xe30

This shows that page_to_nid() in page_zone() is unexpectedly called for an
offlined memmap.

Use pfn_to_online_page() to get struct page in PFN walker.

Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")      [visible after d0dc12e86b319]
Signed-off-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Co-developed-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/huge_memory: minor cleanup for split_huge_pages_all</title>
<updated>2022-10-12T07:53:28Z</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2022-07-04T13:21:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f1d6894159fc4164647f6b0e9a0a49b18ae64feb'/>
<id>urn:sha1:f1d6894159fc4164647f6b0e9a0a49b18ae64feb</id>
<content type='text'>
[ Upstream commit a17206dac7b262e7abed5a05e34a6bd6bd0a9b06 ]

There is nothing to do if a zone doesn't have any pages managed by the
buddy allocator. So we should check managed_zone instead. Also if a thp
is found, there's no need to traverse the subpages again.

Link: https://lkml.kernel.org/r/20220704132201.14611-13-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Zach O'Keefe &lt;zokeefe@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Stable-dep-of: 2b7aa91ba0e8 ("mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: gup: fix the fast GUP race against THP collapse</title>
<updated>2022-10-12T07:53:26Z</updated>
<author>
<name>Yang Shi</name>
<email>shy828301@gmail.com</email>
</author>
<published>2022-09-07T18:01:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=46c22e7b094f85858f5581bba0e3c59400f53dfa'/>
<id>urn:sha1:46c22e7b094f85858f5581bba0e3c59400f53dfa</id>
<content type='text'>
commit 70cbc3cc78a997d8247b50389d37c4e1736019da upstream.

Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly.  On architectures that send an
IPI broadcast on TLB flush, it works as expected.  But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:

   CPU A                                          CPU B
THP collapse                                     fast GUP
                                              gup_pmd_range() &lt;-- see valid pmd
                                                  gup_pte_range() &lt;-- work on pte
pmdp_collapse_flush() &lt;-- clear pmd and flush
__collapse_huge_page_isolate()
    check page pinned &lt;-- before GUP bump refcount
                                                      pin the page
                                                      check PTE &lt;-- no change
__collapse_huge_page_copy()
    copy data to huge page
    ptep_clear()
install huge pmd for the huge page
                                                      return the stale page
discard the stale page

The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE.  If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.

Also update the stale comment about serializing against fast GUP in
khugepaged.

Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Yang Shi &lt;shy828301@gmail.com&gt;
Reviewed-by: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm,hwpoison: check mm when killing accessing process</title>
<updated>2022-10-05T08:39:39Z</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2022-09-14T06:49:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6287c9e00595e19f6097309d80592980d933bd50'/>
<id>urn:sha1:6287c9e00595e19f6097309d80592980d933bd50</id>
<content type='text'>
commit 77677cdbc2aa4b5d5d839562793d3d126201d18d upstream.

The GHES code calls memory_failure_queue() from IRQ context to queue work
into workqueue and schedule it on the current CPU.  Then the work is
processed in memory_failure_work_func() by kworker and calls
memory_failure().

When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send
SIGBUS with error virutal address") make memory_failure() call
kill_accessing_process() that:

    - holds mmap locking of current-&gt;mm
    - does pagetable walk to find the error virtual address
    - and sends SIGBUS to the current process with error info.

However, the mm of kworker is not valid, resulting in a null-pointer
dereference.  So check mm when killing the accessing process.

[akpm@linux-foundation.org: remove unrelated whitespace alteration]
Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Bixuan Cui &lt;cuibixuan@linux.alibaba.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix madivse_pageout mishandling on non-LRU page</title>
<updated>2022-10-05T08:39:39Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2022-09-08T15:12:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9aed3d8a0298a5391a1d6611a753c98d221f599'/>
<id>urn:sha1:f9aed3d8a0298a5391a1d6611a753c98d221f599</id>
<content type='text'>
commit 58d426a7ba92870d489686dfdb9d06b66815a2ab upstream.

MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
isolate_lru_page below.

Fix it by checking PageLRU in advance.

------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140

Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Reported-by: 韩天ç` &lt;hantianshuo@iie.ac.cn&gt;
Suggested-by: Yang Shi &lt;shy828301@gmail.com&gt;
Acked-by: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/migrate_device.c: flush TLB while holding PTL</title>
<updated>2022-10-05T08:39:39Z</updated>
<author>
<name>Alistair Popple</name>
<email>apopple@nvidia.com</email>
</author>
<published>2022-09-02T00:35:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1299c11988783d84fff70d6b8a2279ca2d070c8b'/>
<id>urn:sha1:1299c11988783d84fff70d6b8a2279ca2d070c8b</id>
<content type='text'>
commit 60bae73708963de4a17231077285bd9ff2f41c44 upstream.

When clearing a PTE the TLB should be flushed whilst still holding the PTL
to avoid a potential race with madvise/munmap/etc.  For example consider
the following sequence:

  CPU0                          CPU1
  ----                          ----

  migrate_vma_collect_pmd()
  pte_unmap_unlock()
                                madvise(MADV_DONTNEED)
                                -&gt; zap_pte_range()
                                pte_offset_map_lock()
                                [ PTE not present, TLB not flushed ]
                                pte_unmap_unlock()
                                [ page is still accessible via stale TLB ]
  flush_tlb_range()

In this case the page may still be accessed via the stale TLB entry after
madvise returns.  Fix this by flushing the TLB while holding the PTL.

Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Link: https://lkml.kernel.org/r/9f801e9d8d830408f2ca27821f606e09aa856899.1662078528.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple &lt;apopple@nvidia.com&gt;
Reported-by: Nadav Amit &lt;nadav.amit@gmail.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Alex Sierra &lt;alex.sierra@amd.com&gt;
Cc: Ben Skeggs &lt;bskeggs@redhat.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: huang ying &lt;huang.ying.caritas@gmail.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Karol Herbst &lt;kherbst@redhat.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Lyude Paul &lt;lyude@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Paul Mackerras &lt;paulus@ozlabs.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
