<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/xen/gntdev.c, branch v5.4.43</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.43</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.43'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-10-10T15:43:35Z</updated>
<entry>
<title>xen: Stop abusing DT of_dma_configure API</title>
<updated>2019-10-10T15:43:35Z</updated>
<author>
<name>Rob Herring</name>
<email>robh@kernel.org</email>
</author>
<published>2019-10-08T19:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee7f5225dc3cc7c19df1603597532ff34571f895'/>
<id>urn:sha1:ee7f5225dc3cc7c19df1603597532ff34571f895</id>
<content type='text'>
As the removed comments say, these aren't DT based devices.
of_dma_configure() is going to stop allowing a NULL DT node and calling
it will no longer work.

The comment is also now out of date as of commit 9ab91e7c5c51 ("arm64:
default to the direct mapping in get_arch_dma_ops"). Direct mapping
is now the default rather than dma_dummy_ops.

According to Stefano and Oleksandr, the only other part needed is
setting the DMA masks and there's no reason to restrict the masks to
32-bits. So set the masks to 64 bits.

Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Julien Grall &lt;julien.grall@arm.com&gt;
Cc: Nicolas Saenz Julienne &lt;nsaenzjulienne@suse.de&gt;
Cc: Oleksandr Andrushchenko &lt;oleksandr_andrushchenko@epam.com&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Stefano Stabellini &lt;sstabellini@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Rob Herring &lt;robh@kernel.org&gt;
Acked-by: Oleksandr Andrushchenko &lt;oleksandr_andrushchenko@epam.com&gt;
Reviewed-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Signed-off-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
</content>
</entry>
<entry>
<title>xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero()</title>
<updated>2019-07-31T06:13:59Z</updated>
<author>
<name>Souptick Joarder</name>
<email>jrdr.linux@gmail.com</email>
</author>
<published>2019-07-30T18:34:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d1502f629c9966743de45744f4c1ba93a57d105'/>
<id>urn:sha1:8d1502f629c9966743de45744f4c1ba93a57d105</id>
<content type='text'>
'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma-&gt;vm_pgoff &gt; 0, vm_map_pages()
will:
 - use map-&gt;pages starting at vma-&gt;vm_pgoff instead of 0
 - verify map-&gt;count against vma_pages()+vma-&gt;vm_pgoff instead of just
   vma_pages().

In practice, this breaks using a single gntdev FD for mapping multiple
grants.

relevant strace output:
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid   857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)

details here:
https://github.com/QubesOS/qubes-issues/issues/5199

The reason is -&gt; ( copying Marek's word from discussion)

vma-&gt;vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map-&gt;pages[0] should be mapped at
vma-&gt;vm_start, not vma-&gt;vm_start+vma-&gt;vm_pgoff*PAGE_SIZE.

When trying to map grant with index (aka vma-&gt;vm_pgoff) &gt; 1,
__vm_map_pages() will refuse to map it because it will expect map-&gt;count
to be at least vma_pages(vma)+vma-&gt;vm_pgoff, while it is exactly
vma_pages(vma).

Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.

Marek has tested and confirmed the same.

Cc: stable@vger.kernel.org # v5.2+
Fixes: df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")

Reported-by: Marek Marczykowski-Górecki &lt;marmarek@invisiblethingslab.com&gt;
Signed-off-by: Souptick Joarder &lt;jrdr.linux@gmail.com&gt;
Tested-by: Marek Marczykowski-Górecki &lt;marmarek@invisiblethingslab.com&gt;
Reviewed-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Signed-off-by: Juergen Gross &lt;jgross@suse.com&gt;
</content>
</entry>
<entry>
<title>mm/pgtable: drop pgtable_t variable from pte_fn_t functions</title>
<updated>2019-07-12T18:05:46Z</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2019-07-12T03:58:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8b1e0f81fb6fcf3109465a168b2e2da3f711fa86'/>
<id>urn:sha1:8b1e0f81fb6fcf3109465a168b2e2da3f711fa86</id>
<content type='text'>
Drop the pgtable_t variable from all implementation for pte_fn_t as none
of them use it.  apply_to_pte_range() should stop computing it as well.
Should help us save some cycles.

Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Acked-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: &lt;jglisse@redhat.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xen/gntdev.c: convert to use vm_map_pages()</title>
<updated>2019-05-14T16:47:50Z</updated>
<author>
<name>Souptick Joarder</name>
<email>jrdr.linux@gmail.com</email>
</author>
<published>2019-05-14T00:22:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=df9bde015a72ffd978e39a750662c7cf579b1715'/>
<id>urn:sha1:df9bde015a72ffd978e39a750662c7cf579b1715</id>
<content type='text'>
Convert to use vm_map_pages() to map range of kernel memory to user vma.

map-&gt;count is passed to vm_map_pages() and internal API verify map-&gt;count
against count ( count = vma_pages(vma)) for page array boundary overrun
condition.

Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder &lt;jrdr.linux@gmail.com&gt;
Reviewed-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Heiko Stuebner &lt;heiko@sntech.de&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Kyungmin Park &lt;kyungmin.park@samsung.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Oleksandr Andrushchenko &lt;oleksandr_andrushchenko@epam.com&gt;
Cc: Pawel Osciak &lt;pawel@osciak.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Sandy Huang &lt;hjc@rock-chips.com&gt;
Cc: Stefan Richter &lt;stefanr@s5r6.in-berlin.de&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Thierry Reding &lt;treding@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/mmu_notifier: convert user range-&gt;blockable to helper function</title>
<updated>2019-05-14T16:47:49Z</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2019-05-14T00:20:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dfcd66604c1c116ffc7a94375becbed1d7ecbef1'/>
<id>urn:sha1:dfcd66604c1c116ffc7a94375becbed1d7ecbef1</id>
<content type='text'>
Use the mmu_notifier_range_blockable() helper function instead of directly
dereferencing the range-&gt;blockable field.  This is done to make it easier
to change the mmu_notifier range field.

This patch is the outcome of the following coccinelle patch:

%&lt;-------------------------------------------------------------------
@@
identifier I1, FN;
@@
FN(..., struct mmu_notifier_range *I1, ...) {
&lt;...
-I1-&gt;blockable
+mmu_notifier_range_blockable(I1)
...&gt;
}
-------------------------------------------------------------------&gt;%

spatch --in-place --sp-file blockable.spatch --dir .

Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Reviewed-by: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Cc: Jani Nikula &lt;jani.nikula@linux.intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Cc: Ross Zwisler &lt;zwisler@kernel.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krcmar &lt;rkrcmar@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/gup: change GUP fast to use flags rather than a write 'bool'</title>
<updated>2019-05-14T16:47:46Z</updated>
<author>
<name>Ira Weiny</name>
<email>ira.weiny@intel.com</email>
</author>
<published>2019-05-14T00:17:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=73b0140bf0fe9df90fb267c00673c4b9bf285430'/>
<id>urn:sha1:73b0140bf0fe9df90fb267c00673c4b9bf285430</id>
<content type='text'>
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Mike Marshall &lt;hubcap@omnibond.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Hogan &lt;jhogan@kernel.org&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xen/gntdev: Do not destroy context while dma-bufs are in use</title>
<updated>2019-02-18T05:50:03Z</updated>
<author>
<name>Oleksandr Andrushchenko</name>
<email>oleksandr_andrushchenko@epam.com</email>
</author>
<published>2019-02-14T14:23:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fa13e665e02874c0a5f4d06d6967ae34a6cb3d6a'/>
<id>urn:sha1:fa13e665e02874c0a5f4d06d6967ae34a6cb3d6a</id>
<content type='text'>
If there are exported DMA buffers which are still in use and
grant device is closed by either normal user-space close or by
a signal this leads to the grant device context to be destroyed,
thus making it not possible to correctly destroy those exported
buffers when they are returned back to gntdev and makes the module
crash:

[  339.617540] [&lt;ffff00000854c0d8&gt;] dmabuf_exp_ops_release+0x40/0xa8
[  339.617560] [&lt;ffff00000867a6e8&gt;] dma_buf_release+0x60/0x190
[  339.617577] [&lt;ffff0000082211f0&gt;] __fput+0x88/0x1d0
[  339.617589] [&lt;ffff000008221394&gt;] ____fput+0xc/0x18
[  339.617607] [&lt;ffff0000080ed4e4&gt;] task_work_run+0x9c/0xc0
[  339.617622] [&lt;ffff000008089714&gt;] do_notify_resume+0xfc/0x108

Fix this by referencing gntdev on each DMA buffer export and
unreferencing on buffer release.

Signed-off-by: Oleksandr Andrushchenko &lt;oleksandr_andrushchenko@epam.com&gt;
Reviewed-by: Boris Ostrovsky@oracle.com&gt;
Signed-off-by: Juergen Gross &lt;jgross@suse.com&gt;
</content>
</entry>
<entry>
<title>mm/mmu_notifier: use structure for invalidate_range_start/end callback</title>
<updated>2018-12-28T20:11:50Z</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2018-12-28T08:38:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5d6527a784f7a6d247961e046e830de8d71b47d1'/>
<id>urn:sha1:5d6527a784f7a6d247961e046e830de8d71b47d1</id>
<content type='text'>
Patch series "mmu notifier contextual informations", v2.

This patchset adds contextual information, why an invalidation is
happening, to mmu notifier callback.  This is necessary for user of mmu
notifier that wish to maintains their own data structure without having to
add new fields to struct vm_area_struct (vma).

For instance device can have they own page table that mirror the process
address space.  When a vma is unmap (munmap() syscall) the device driver
can free the device page table for the range.

Today we do not have any information on why a mmu notifier call back is
happening and thus device driver have to assume that it is always an
munmap().  This is inefficient at it means that it needs to re-allocate
device page table on next page fault and rebuild the whole device driver
data structure for the range.

Other use case beside munmap() also exist, for instance it is pointless
for device driver to invalidate the device page table when the
invalidation is for the soft dirtyness tracking.  Or device driver can
optimize away mprotect() that change the page table permission access for
the range.

This patchset enables all this optimizations for device drivers.  I do not
include any of those in this series but another patchset I am posting will
leverage this.

The patchset is pretty simple from a code point of view.  The first two
patches consolidate all mmu notifier arguments into a struct so that it is
easier to add/change arguments.  The last patch adds the contextual
information (munmap, protection, soft dirty, clear, ...).

This patch (of 3):

To avoid having to change many callback definition everytime we want to
add a parameter use a structure to group all parameters for the
mmu_notifier invalidate_range_start/end callback.  No functional changes
with this patch.

[akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;	[infiniband]
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Ross Zwisler &lt;zwisler@kernel.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krcmar &lt;rkrcmar@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xen/gntdev: fix up blockable calls to mn_invl_range_start</title>
<updated>2018-09-14T12:52:30Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-09-04T23:21:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=58a57569904039d9ac38c0ff2a88396a43899689'/>
<id>urn:sha1:58a57569904039d9ac38c0ff2a88396a43899689</id>
<content type='text'>
Patch series "mmu_notifiers follow ups".

Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish
blockable mode for mmu notifiers").  One of them has been fixed and picked
up by AMD/DRM maintainer [1].  XEN issue is fixed by patch 1.  I have also
clarified expectations about blockable semantic of invalidate_range_end.
Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no
longer used nor needed.

[1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz

This patch (of 3):

93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has
introduced blockable parameter to all mmu_notifiers and the notifier has
to back off when called in !blockable case and it could block down the
road.

The above commit implemented that for mn_invl_range_start but both
in_range checks are done unconditionally regardless of the blockable mode
and as such they would fail all the time for regular calls.  Fix this by
checking blockable parameter as well.

Once we are there we can remove the stale TODO.  The lock has to be
sleepable because we wait for completion down in gnttab_unmap_refs_sync.

Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org
Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Reviewed-by: Juergen Gross &lt;jgross@suse.com&gt;
Signed-off-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
</content>
</entry>
<entry>
<title>mm, oom: distinguish blockable mode for mmu notifiers</title>
<updated>2018-08-22T17:52:44Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-08-22T04:52:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=93065ac753e4443840a057bfef4be71ec766fde9'/>
<id>urn:sha1:93065ac753e4443840a057bfef4be71ec766fde9</id>
<content type='text'>
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt; # AMD notifiers
Acked-by: Leon Romanovsky &lt;leonro@mellanox.com&gt; # mlx and umem_odp
Reported-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: "David (ChunMing) Zhou" &lt;David1.Zhou@amd.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Jani Nikula &lt;jani.nikula@linux.intel.com&gt;
Cc: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Doug Ledford &lt;dledford@redhat.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Mike Marciniszyn &lt;mike.marciniszyn@intel.com&gt;
Cc: Dennis Dalessandro &lt;dennis.dalessandro@intel.com&gt;
Cc: Sudeep Dutt &lt;sudeep.dutt@intel.com&gt;
Cc: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Cc: Dimitri Sivanich &lt;sivanich@sgi.com&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: "Jérôme Glisse" &lt;jglisse@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
