<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/dma-mapping.h, branch v6.9.4</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-11-06T07:35:09Z</updated>
<entry>
<title>dma-mapping: move dma_addressing_limited() out of line</title>
<updated>2023-11-06T07:35:09Z</updated>
<author>
<name>Jia He</name>
<email>justin.he@arm.com</email>
</author>
<published>2023-10-28T10:20:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ae0e970319ac0b516d285650a744bab4ed3dd37'/>
<id>urn:sha1:8ae0e970319ac0b516d285650a744bab4ed3dd37</id>
<content type='text'>
This patch moves dma_addressing_limited() out of line, serving as a
preliminary step to prevent the introduction of a new publicly accessible
low-level helper when validating whether all system RAM is mapped within
the DMA mapping range.

Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>swiotlb: if swiotlb is full, fall back to a transient memory pool</title>
<updated>2023-08-01T16:02:20Z</updated>
<author>
<name>Petr Tesarik</name>
<email>petr.tesarik.ext@huawei.com</email>
</author>
<published>2023-08-01T06:24:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=79636caad3618e2b38457f6e298c9b31ba82b489'/>
<id>urn:sha1:79636caad3618e2b38457f6e298c9b31ba82b489</id>
<content type='text'>
Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik &lt;petr.tesarik.ext@huawei.com&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma: allow dma_get_cache_alignment() to be overridden by the arch code</title>
<updated>2023-06-19T23:19:20Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2023-06-12T15:31:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c57da28dc3df4e091474a004b5596c9b88a3be0'/>
<id>urn:sha1:8c57da28dc3df4e091474a004b5596c9b88a3be0</id>
<content type='text'>
On arm64, ARCH_DMA_MINALIGN is larger than most cache line size
configurations deployed.  Allow an architecture to override
dma_get_cache_alignment() in order to return a run-time probed value (e.g.
cache_line_size()).

Link: https://lkml.kernel.org/r/20230612153201.554742-3-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Jerry Snitselaar &lt;jsnitsel@redhat.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Jonathan Cameron &lt;jic23@kernel.org&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Lars-Peter Clausen &lt;lars@metafoo.de&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Mike Snitzer &lt;snitzer@kernel.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN</title>
<updated>2023-06-19T23:19:20Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2023-06-12T15:31:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ab5f8ec7d71aea5fe13a48248242130f84ac6bb'/>
<id>urn:sha1:4ab5f8ec7d71aea5fe13a48248242130f84ac6bb</id>
<content type='text'>
Patch series "mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8", v7.

A series reducing the kmalloc() minimum alignment on arm64 to 8 (from
128).  


This patch (of 17):

In preparation for supporting a kmalloc() minimum alignment smaller than
the arch DMA alignment, decouple the two definitions.  This requires that
either the kmalloc() caches are aligned to a (run-time) cache-line size or
the DMA API bounces unaligned kmalloc() allocations.  Subsequent patches
will implement both options.

After this patch, ARCH_DMA_MINALIGN is expected to be used in static
alignment annotations and defined by an architecture to be the maximum
alignment for all supported configurations/SoCs in a single Image. 
Architectures opting in to a smaller ARCH_KMALLOC_MINALIGN will need to
define its value in the arch headers.

Since ARCH_DMA_MINALIGN is now always defined, adjust the #ifdef in
dma_get_cache_alignment() so that there is no change for architectures not
requiring a minimum DMA alignment.

Link: https://lkml.kernel.org/r/20230612153201.554742-1-catalin.marinas@arm.com
Link: https://lkml.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Tested-by: Isaac J. Manjarres &lt;isaacmanjarres@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Jonathan Cameron &lt;jic23@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Mike Snitzer &lt;snitzer@kernel.org&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Jerry Snitselaar &lt;jsnitsel@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Lars-Peter Clausen &lt;lars@metafoo.de&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dma-mapping: mark dma_supported static</title>
<updated>2022-09-07T08:38:28Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-08-21T14:06:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9fc18f6d56d5b79d527c17a8100a0965d18345cf'/>
<id>urn:sha1:9fc18f6d56d5b79d527c17a8100a0965d18345cf</id>
<content type='text'>
Now that the remaining users in drivers are gone, this function can be
marked static.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support</title>
<updated>2022-07-26T11:27:48Z</updated>
<author>
<name>Logan Gunthorpe</name>
<email>logang@deltatee.com</email>
</author>
<published>2022-07-08T16:50:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=159bf19270e80b5bc4b13aa88072dcb390b4d297'/>
<id>urn:sha1:159bf19270e80b5bc4b13aa88072dcb390b4d297</id>
<content type='text'>
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>dma-mapping: add dma_opt_mapping_size()</title>
<updated>2022-07-19T04:05:41Z</updated>
<author>
<name>John Garry</name>
<email>john.garry@huawei.com</email>
</author>
<published>2022-07-14T11:15:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a229cc14f3395311b899e5e582b71efa8dd01df0'/>
<id>urn:sha1:a229cc14f3395311b899e5e582b71efa8dd01df0</id>
<content type='text'>
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry &lt;john.garry@huawei.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""</title>
<updated>2022-03-28T18:37:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-28T18:37:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544'/>
<id>urn:sha1:901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544</id>
<content type='text'>
Halil Pasic points out [1] that the full revert of that commit (revert
in bddac7c1e02b), and that a partial revert that only reverts the
problematic case, but still keeps some of the cleanups is probably
better.  ￼

And that partial revert [2] had already been verified by Oleksandr
Natalenko to also fix the issue, I had just missed that in the long
discussion.

So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb:
rework "fix info leak with DMA_FROM_DEVICE""), and effectively only
revert the part that caused problems.

Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1]
Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2]
Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3]
Suggested-by: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Tested-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Cc: Christoph Hellwig" &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE""</title>
<updated>2022-03-26T17:42:04Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-26T17:42:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bddac7c1e02ba47f0570e494c9289acea3062cc1'/>
<id>urn:sha1:bddac7c1e02ba47f0570e494c9289acea3062cc1</id>
<content type='text'>
This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.

It turns out this breaks at least the ath9k wireless driver, and
possibly others.

What the ath9k driver does on packet receive is to set up the DMA
transfer with:

  int ath_rx_init(..)
  ..
                bf-&gt;bf_buf_addr = dma_map_single(sc-&gt;dev, skb-&gt;data,
                                                 common-&gt;rx_bufsize,
                                                 DMA_FROM_DEVICE);

and then the receive logic (through ath_rx_tasklet()) will fetch
incoming packets

  static bool ath_edma_get_buffers(..)
  ..
        dma_sync_single_for_cpu(sc-&gt;dev, bf-&gt;bf_buf_addr,
                                common-&gt;rx_bufsize, DMA_FROM_DEVICE);

        ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb-&gt;data);
        if (ret == -EINPROGRESS) {
                /*let device gain the buffer again*/
                dma_sync_single_for_device(sc-&gt;dev, bf-&gt;bf_buf_addr,
                                common-&gt;rx_bufsize, DMA_FROM_DEVICE);
                return false;
        }

and it's worth noting how that first DMA sync:

    dma_sync_single_for_cpu(..DMA_FROM_DEVICE);

is there to make sure the CPU can read the DMA buffer (possibly by
copying it from the bounce buffer area, or by doing some cache flush).
The iommu correctly turns that into a "copy from bounce bufer" so that
the driver can look at the state of the packets.

In the meantime, the device may continue to write to the DMA buffer, but
we at least have a snapshot of the state due to that first DMA sync.

But that _second_ DMA sync:

    dma_sync_single_for_device(..DMA_FROM_DEVICE);

is telling the DMA mapping that the CPU wasn't interested in the area
because the packet wasn't there.  In the case of a DMA bounce buffer,
that is a no-op.

Note how it's not a sync for the CPU (the "for_device()" part), and it's
not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part).

Or rather, it _should_ be a no-op.  That's what commit aa6f8dcbab47
broke: it made the code bounce the buffer unconditionally, and changed
the DMA_FROM_DEVICE to just unconditionally and illogically be
DMA_TO_DEVICE.

[ Side note: purely within the confines of the swiotlb driver it wasn't
  entirely illogical: The reason it did that odd DMA_FROM_DEVICE -&gt;
  DMA_TO_DEVICE conversion thing is because inside the swiotlb driver,
  it uses just a swiotlb_bounce() helper that doesn't care about the
  whole distinction of who the sync is for - only which direction to
  bounce.

  So it took the "sync for device" to mean that the CPU must have been
  the one writing, and thought it meant DMA_TO_DEVICE. ]

Also note how the commentary in that commit was wrong, probably due to
that whole confusion, claiming that the commit makes the swiotlb code

                                  "bounce unconditionally (that is, also
    when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
    data from the swiotlb buffer"

which is nonsensical for two reasons:

 - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was
   exactly when it always did - and should do - the bounce.

 - since this is a sync for the device (not for the CPU), we're clearly
   fundamentally not coping back stale data from the bounce buffers at
   all, because we'd be copying *to* the bounce buffers.

So that commit was just very confused.  It confused the direction of the
synchronization (to the device, not the cpu) with the direction of the
DMA (from the device).

Reported-and-bisected-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Reported-by: Olha Cherevyk &lt;olha.cherevyk@gmail.com&gt;
Cc: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Kalle Valo &lt;kvalo@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Toke Høiland-Jørgensen &lt;toke@toke.dk&gt;
Cc: Maxime Bizon &lt;mbizon@freebox.fr&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>swiotlb: rework "fix info leak with DMA_FROM_DEVICE"</title>
<updated>2022-03-07T19:26:02Z</updated>
<author>
<name>Halil Pasic</name>
<email>pasic@linux.ibm.com</email>
</author>
<published>2022-03-05T17:07:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13'/>
<id>urn:sha1:aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13</id>
<content type='text'>
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.

The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
  must take precedence over DMA_ATTR_SKIP_CPU_SYNC

Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.

Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.

To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.

Signed-off-by: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
