<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/dma-mapping.h, branch v6.10.6</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.10.6</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.10.6'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-05-09T17:00:29Z</updated>
<entry>
<title>dma: fix DMA sync for drivers not calling dma_set_mask*()</title>
<updated>2024-05-09T17:00:29Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-05-09T14:46:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a6016aac5252da9d22a4dc0b98121b0acdf6d2f5'/>
<id>urn:sha1:a6016aac5252da9d22a4dc0b98121b0acdf6d2f5</id>
<content type='text'>
There are several reports that the DMA sync shortcut broke non-coherent
devices.
dev-&gt;dma_need_sync is false after the &amp;device allocation and if a driver
didn't call dma_set_mask*(), it will still be false even if the device
is not DMA-coherent and thus needs synchronizing. Due to historical
reasons, there's still a lot of drivers not calling it.
Invert the boolean, so that the sync will be performed by default and
the shortcut will be enabled only when calling dma_set_mask*().

Reported-by: Steven Price &lt;steven.price@arm.com&gt;
Closes: https://lore.kernel.org/lkml/010686f5-3049-46a1-8230-7752a1b433ff@arm.com
Reported-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Closes: https://lore.kernel.org/lkml/46160534-5003-4809-a408-6b3a3f4921e9@samsung.com
Fixes: f406c8e4b770. ("dma: avoid redundant calls for sync operations")
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Steven Price &lt;steven.price@arm.com&gt;
Tested-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
</entry>
<entry>
<title>dma: avoid redundant calls for sync operations</title>
<updated>2024-05-07T11:29:53Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-05-07T11:20:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f406c8e4b770ca3b0df84a17349e13f2b6b07d10'/>
<id>urn:sha1:f406c8e4b770ca3b0df84a17349e13f2b6b07d10</id>
<content type='text'>
Quite often, devices do not need dma_sync operations on x86_64 at least.
Indeed, when dev_is_dma_coherent(dev) is true and
dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu()
and friends do nothing.

However, indirectly calling them when CONFIG_RETPOLINE=y consumes about
10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate.
Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%.

Add dev-&gt;need_dma_sync boolean and turn it off during the device
initialization (dma_set_mask()) depending on the setup:
dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device ||
sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC,
advertised for non-NULL DMA ops.
Then later, if/when swiotlb is used for the first time, the flag
is reset back to on, from swiotlb_tbl_map_single().

On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows
+3-5% increase for direct DMA.

Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt; # direct DMA shortcut
Co-developed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma: compile-out DMA sync op calls when not used</title>
<updated>2024-05-07T11:29:53Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-05-07T11:20:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fe7514b149e0a8a6f3031d286e52d40163b0b11a'/>
<id>urn:sha1:fe7514b149e0a8a6f3031d286e52d40163b0b11a</id>
<content type='text'>
Some platforms do have DMA, but DMA there is always direct and coherent.
Currently, even on such platforms DMA sync operations are compiled and
called.
Add a new hidden Kconfig symbol, DMA_NEED_SYNC, and set it only when
either sync operations are needed or there is DMA ops or swiotlb
or DMA debug is enabled. Compile global dma_sync_*() and dma_need_sync()
only when it's set, otherwise provide empty inline stubs.
The change allows for future optimizations of DMA sync calls depending
on runtime conditions.

Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma-mapping: move dma_addressing_limited() out of line</title>
<updated>2023-11-06T07:35:09Z</updated>
<author>
<name>Jia He</name>
<email>justin.he@arm.com</email>
</author>
<published>2023-10-28T10:20:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ae0e970319ac0b516d285650a744bab4ed3dd37'/>
<id>urn:sha1:8ae0e970319ac0b516d285650a744bab4ed3dd37</id>
<content type='text'>
This patch moves dma_addressing_limited() out of line, serving as a
preliminary step to prevent the introduction of a new publicly accessible
low-level helper when validating whether all system RAM is mapped within
the DMA mapping range.

Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>swiotlb: if swiotlb is full, fall back to a transient memory pool</title>
<updated>2023-08-01T16:02:20Z</updated>
<author>
<name>Petr Tesarik</name>
<email>petr.tesarik.ext@huawei.com</email>
</author>
<published>2023-08-01T06:24:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=79636caad3618e2b38457f6e298c9b31ba82b489'/>
<id>urn:sha1:79636caad3618e2b38457f6e298c9b31ba82b489</id>
<content type='text'>
Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik &lt;petr.tesarik.ext@huawei.com&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma: allow dma_get_cache_alignment() to be overridden by the arch code</title>
<updated>2023-06-19T23:19:20Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2023-06-12T15:31:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c57da28dc3df4e091474a004b5596c9b88a3be0'/>
<id>urn:sha1:8c57da28dc3df4e091474a004b5596c9b88a3be0</id>
<content type='text'>
On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
configurations deployed.  Allow an architecture to override
dma_get_cache_alignment() in order to return a run-time probed value (e.g.
cache_line_size()).

Link: https://lkml.kernel.org/r/20230612153201.554742-3-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Jerry Snitselaar &lt;jsnitsel@redhat.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Jonathan Cameron &lt;jic23@kernel.org&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Lars-Peter Clausen &lt;lars@metafoo.de&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Mike Snitzer &lt;snitzer@kernel.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN</title>
<updated>2023-06-19T23:19:20Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2023-06-12T15:31:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ab5f8ec7d71aea5fe13a48248242130f84ac6bb'/>
<id>urn:sha1:4ab5f8ec7d71aea5fe13a48248242130f84ac6bb</id>
<content type='text'>
Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7.

A series reducing the kmalloc() minimum alignment on arm64 to 8 (from
128).  


This patch (of 17):

In preparation for supporting a kmalloc() minimum alignment smaller than
the arch DMA alignment, decouple the two definitions.  This requires that
either the kmalloc() caches are aligned to a (run-time) cache-line size or
the DMA API bounces unaligned kmalloc() allocations.  Subsequent patches
will implement both options.

After this patch, ARCH_DMA_MINALIGN is expected to be used in static
alignment annotations and defined by an architecture to be the maximum
alignment for all supported configurations/SoCs in a single Image. 
Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
define its value in the arch headers.

Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
dma_get_cache_alignment() so that there is no change for architectures not
requiring a minimum DMA alignment.

Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com
Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Jonathan Cameron &lt;jic23@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Mike Snitzer &lt;snitzer@kernel.org&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Jerry Snitselaar &lt;jsnitsel@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Lars-Peter Clausen &lt;lars@metafoo.de&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dma-mapping: mark dma_supported static</title>
<updated>2022-09-07T08:38:28Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-08-21T14:06:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9fc18f6d56d5b79d527c17a8100a0965d18345cf'/>
<id>urn:sha1:9fc18f6d56d5b79d527c17a8100a0965d18345cf</id>
<content type='text'>
Now that the remaining users in drivers are gone, this function can be
marked static.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support</title>
<updated>2022-07-26T11:27:48Z</updated>
<author>
<name>Logan Gunthorpe</name>
<email>logang@deltatee.com</email>
</author>
<published>2022-07-08T16:50:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=159bf19270e80b5bc4b13aa88072dcb390b4d297'/>
<id>urn:sha1:159bf19270e80b5bc4b13aa88072dcb390b4d297</id>
<content type='text'>
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma-mapping: add dma_opt_mapping_size()</title>
<updated>2022-07-19T04:05:41Z</updated>
<author>
<name>John Garry</name>
<email>john.garry@huawei.com</email>
</author>
<published>2022-07-14T11:15:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a229cc14f3395311b899e5e582b71efa8dd01df0'/>
<id>urn:sha1:a229cc14f3395311b899e5e582b71efa8dd01df0</id>
<content type='text'>
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry &lt;john.garry@huawei.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
</feed>
