<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block/bdev.c, branch v6.17.11</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.11</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.11'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-05-26T19:56:01Z</updated>
<entry>
<title>Merge tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux</title>
<updated>2025-05-26T19:56:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-26T19:56:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f83fcb87f824b0bfbf1200590cc80f05e66488a7'/>
<id>urn:sha1:f83fcb87f824b0bfbf1200590cc80f05e66488a7</id>
<content type='text'>
Pull xfs updates from Carlos Maiolino:

 - Atomic writes for XFS

 - Remove experimental warnings for pNFS, scrub and parent pointers

* tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
  xfs: add inode to zone caching for data placement
  xfs: free the item in xfs_mru_cache_insert on failure
  xfs: remove the EXPERIMENTAL warning for pNFS
  xfs: remove some EXPERIMENTAL warnings
  xfs: Remove deprecated xfs_bufd sysctl parameters
  xfs: stop using set_blocksize
  xfs: allow sysadmins to specify a maximum atomic write limit at mount time
  xfs: update atomic write limits
  xfs: add xfs_calc_atomic_write_unit_max()
  xfs: add xfs_file_dio_write_atomic()
  xfs: commit CoW-based atomic writes atomically
  xfs: add large atomic writes checks in xfs_direct_write_iomap_begin()
  xfs: add xfs_atomic_write_cow_iomap_begin()
  xfs: refine atomic write size check in xfs_file_write_iter()
  xfs: refactor xfs_reflink_end_cow_extent()
  xfs: allow block allocator to take an alignment hint
  xfs: ignore HW which cannot atomic write a single block
  xfs: add helpers to compute transaction reservation for finishing intent items
  xfs: add helpers to compute log item overhead
  xfs: separate out setting buftarg atomic writes limits
  ...
</content>
</entry>
<entry>
<title>fs: add atomic write unit max opt to statx</title>
<updated>2025-05-07T21:25:30Z</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2025-05-07T21:18:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5d894321c49e61379189b0ff605f316e39cbd1e9'/>
<id>urn:sha1:5d894321c49e61379189b0ff605f316e39cbd1e9</id>
<content type='text'>
XFS will be able to support large atomic writes (atomic write &gt; 1x block)
in future. This will be achieved by using different operating methods,
depending on the size of the write.

Specifically a new method of operation based in FS atomic extent remapping
will be supported in addition to the current HW offload-based method.

The FS method will generally be appreciably slower performing than the
HW-offload method. However the FS method will be typically able to
contribute to achieving a larger atomic write unit max limit.

XFS will support a hybrid mode, where HW offload method will be used when
possible, i.e. HW offload is used when the length of the write is
supported, and for other times FS-based atomic writes will be used.

As such, there is an atomic write length at which the user may experience
appreciably slower performance.

Advertise this limit in a new statx field, stx_atomic_write_unit_max_opt.

When zero, it means that there is no such performance boundary.

Masks STATX{_ATTR}_WRITE_ATOMIC can be used to get this new field. This is
ok for older kernels which don't support this new field, as they would
report 0 in this field (from zeroing in cp_statx()) already. Furthermore
those older kernels don't support large atomic writes - apart from block
fops, but there would be consistent performance there for atomic writes
in range [unit min, unit max].

Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-6.15-20250424' of git://git.kernel.dk/linux</title>
<updated>2025-04-25T18:34:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-04-25T18:34:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7deea5634a67700d04c2a0e6d2ffa0e2956fe8ad'/>
<id>urn:sha1:7deea5634a67700d04c2a0e6d2ffa0e2956fe8ad</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - Fix autoloading of drivers from stat*(2)

 - Fix losing read-ahead setting one suspend/resume, when a device is
   re-probed.

 - Fix race between setting the block size and page cache updates.
   Includes a helper that a coming XFS fix will use as well.

 - ublk cancelation fixes.

 - ublk selftest additions and fixes.

 - NVMe pull via Christoph:
      - fix an out-of-bounds access in nvmet_enable_port (Richard
        Weinberger)

* tag 'block-6.15-20250424' of git://git.kernel.dk/linux:
  ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
  ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
  block: don't autoload drivers on blk-cgroup configuration
  block: don't autoload drivers on stat
  block: remove the backing_inode variable in bdev_statx
  block: move blkdev_{get,put} _no_open prototypes out of blkdev.h
  block: never reduce ra_pages in blk_apply_bdi_limits
  selftests: ublk: common: fix _get_disk_dev_t for pre-9.0 coreutils
  selftests: ublk: remove useless 'delay_us' from 'struct dev_ctx'
  selftests: ublk: fix recover test
  block: hoist block size validation code to a separate function
  block: fix race between set_blocksize and read paths
  nvmet: fix out-of-bounds access in nvmet_enable_port
</content>
</entry>
<entry>
<title>block: don't autoload drivers on stat</title>
<updated>2025-04-24T13:35:23Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-04-23T05:37:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f33b5226c9d92359e58e91ad0bf0c1791da36a1'/>
<id>urn:sha1:5f33b5226c9d92359e58e91ad0bf0c1791da36a1</id>
<content type='text'>
blkdev_get_no_open can trigger the legacy autoload of block drivers.  A
simple stat of a block device has not historically done that, so disable
this behavior again.

Fixes: 9abcfbd235f5 ("block: Add atomic write support for statx")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20250423053810.1683309-4-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove the backing_inode variable in bdev_statx</title>
<updated>2025-04-24T13:35:09Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-04-23T05:37:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d13b7090b2510abaa83a25717466decca23e8226'/>
<id>urn:sha1:d13b7090b2510abaa83a25717466decca23e8226</id>
<content type='text'>
backing_inode is only used once, so remove it and update the comment
describing the bdev lookup to be a bit more clear.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20250423053810.1683309-3-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: hoist block size validation code to a separate function</title>
<updated>2025-04-23T19:58:06Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2025-04-23T19:53:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e03463d247ddac66e71143468373df3d74a3a6bd'/>
<id>urn:sha1:e03463d247ddac66e71143468373df3d74a3a6bd</id>
<content type='text'>
Hoist the block size validation code to bdev_validate_blocksize so that
we can call it from filesystems that don't care about the bdev pagecache
manipulations of set_blocksize.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/174543795720.4139148.840349813093799165.stgit@frogsfrogsfrogs
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix race between set_blocksize and read paths</title>
<updated>2025-04-23T19:58:06Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2025-04-23T19:53:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c0e473a0d226479e8e925d5ba93f751d8df628e9'/>
<id>urn:sha1:c0e473a0d226479e8e925d5ba93f751d8df628e9</id>
<content type='text'>
With the new large sector size support, it's now the case that
set_blocksize can change i_blksize and the folio order in a manner that
conflicts with a concurrent reader and causes a kernel crash.

Specifically, let's say that udev-worker calls libblkid to detect the
labels on a block device.  The read call can create an order-0 folio to
read the first 4096 bytes from the disk.  But then udev is preempted.

Next, someone tries to mount an 8k-sectorsize filesystem from the same
block device.  The filesystem calls set_blksize, which sets i_blksize to
8192 and the minimum folio order to 1.

Now udev resumes, still holding the order-0 folio it allocated.  It then
tries to schedule a read bio and do_mpage_readahead tries to create
bufferheads for the folio.  Unfortunately, blocks_per_folio == 0 because
the page size is 4096 but the blocksize is 8192 so no bufferheads are
attached and the bh walk never sets bdev.  We then submit the bio with a
NULL block device and crash.

Therefore, truncate the page cache after flushing but before updating
i_blksize.  However, that's not enough -- we also need to lock out file
IO and page faults during the update.  Take both the i_rwsem and the
invalidate_lock in exclusive mode for invalidations, and in shared mode
for read/write operations.

I don't know if this is the correct fix, but xfs/259 found it.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Tested-by: Shin'ichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Link: https://lore.kernel.org/r/174543795699.4139148.2086129139322431423.stgit@frogsfrogsfrogs
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: move the bdex_statx call to vfs_getattr_nosec</title>
<updated>2025-04-17T08:14:34Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-04-17T06:40:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=777d0961ff95b26d5887fdae69900374364976f3'/>
<id>urn:sha1:777d0961ff95b26d5887fdae69900374364976f3</id>
<content type='text'>
Currently bdex_statx is only called from the very high-level
vfs_statx_path function, and thus bypassing it for in-kernel calls
to vfs_getattr or vfs_getattr_nosec.

This breaks querying the block ѕize of the underlying device in the
loop driver and also is a pitfall for any other new kernel caller.

Move the call into the lowest level helper to ensure all callers get
the right results.

Fixes: 2d985f8c6b91 ("vfs: support STATX_DIOALIGN on block devices")
Fixes: f4774e92aab8 ("loop: take the file system minimum dio alignment into account")
Reported-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/20250417064042.712140-1-hch@lst.de
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()</title>
<updated>2025-03-07T11:56:05Z</updated>
<author>
<name>Luis Chamberlain</name>
<email>mcgrof@kernel.org</email>
</author>
<published>2025-03-07T02:04:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a64e5a596067bddba87fcc2ce37e56c3fca831b7'/>
<id>urn:sha1:a64e5a596067bddba87fcc2ce37e56c3fca831b7</id>
<content type='text'>
The commit titled "block/bdev: lift block size restrictions to 64k"
lifted the block layer's max supported block size to 64k inside the
helper blk_validate_block_size() now that we support large folios.
However in lifting the block size we also removed the silly use
cases many filesystems have to use sb_set_blocksize() to *verify*
that the block size &lt;= PAGE_SIZE. The call to sb_set_blocksize() was
used to check the block size &lt;= PAGE_SIZE since historically we've
always supported userspace to create for example 64k block size
filesystems even on 4k page size systems, but what we didn't allow
was mounting them. Older filesystems have been using the check with
sb_set_blocksize() for years.

While, we could argue that such checks should be filesystem specific,
there are much more users of sb_set_blocksize() than LBS enabled
filesystem on upstream, so just do the easier thing and bring back
the PAGE_SIZE check for sb_set_blocksize() users and only skip it
for LBS enabled filesystems.

This will ensure that tests such as generic/466 when run in a loop
against say, ext4, won't try to try to actually mount a filesystem with
a block size larger than your filesystem supports given your PAGE_SIZE
and in the worst case crash.

Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org
Reviewed-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Reviewed-by: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>bdev: use bdev_io_min() for statx block size</title>
<updated>2025-02-24T10:44:44Z</updated>
<author>
<name>Luis Chamberlain</name>
<email>mcgrof@kernel.org</email>
</author>
<published>2025-02-21T22:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=425fbcd62d2e1330e64d8d3bf89e554830ba997f'/>
<id>urn:sha1:425fbcd62d2e1330e64d8d3bf89e554830ba997f</id>
<content type='text'>
You can use lsblk to query for a block device block device block size:

lsblk -o MIN-IO /dev/nvme0n1
MIN-IO
 4096

The min-io is the minimum IO the block device prefers for optimal
performance. In turn we map this to the block device block size.
The current block size exposed even for block devices with an
LBA format of 16k is 4k. Likewise devices which support 4k LBA format
but have a larger Indirection Unit of 16k have an exposed block size
of 4k.

This incurs read-modify-writes on direct IO against devices with a
min-io larger than the page size. To fix this, use the block device
min io, which is the minimal optimal IO the device prefers.

With this we now get:

lsblk -o MIN-IO /dev/nvme0n1
MIN-IO
 16384

And so userspace gets the appropriate information it needs for optimal
performance. This is verified with blkalgn against mkfs against a
device with LBA format of 4k but an NPWG of 16k (min io size)

mkfs.xfs -f -b size=16k  /dev/nvme3n1
blkalgn -d nvme3n1 --ops Write

     Block size          : count     distribution
         0 -&gt; 1          : 0        |                                        |
         2 -&gt; 3          : 0        |                                        |
         4 -&gt; 7          : 0        |                                        |
         8 -&gt; 15         : 0        |                                        |
        16 -&gt; 31         : 0        |                                        |
        32 -&gt; 63         : 0        |                                        |
        64 -&gt; 127        : 0        |                                        |
       128 -&gt; 255        : 0        |                                        |
       256 -&gt; 511        : 0        |                                        |
       512 -&gt; 1023       : 0        |                                        |
      1024 -&gt; 2047       : 0        |                                        |
      2048 -&gt; 4095       : 0        |                                        |
      4096 -&gt; 8191       : 0        |                                        |
      8192 -&gt; 16383      : 0        |                                        |
     16384 -&gt; 32767      : 66       |****************************************|
     32768 -&gt; 65535      : 0        |                                        |
     65536 -&gt; 131071     : 0        |                                        |
    131072 -&gt; 262143     : 2        |*                                       |
Block size: 14 - 66
Block size: 17 - 2

     Algn size           : count     distribution
         0 -&gt; 1          : 0        |                                        |
         2 -&gt; 3          : 0        |                                        |
         4 -&gt; 7          : 0        |                                        |
         8 -&gt; 15         : 0        |                                        |
        16 -&gt; 31         : 0        |                                        |
        32 -&gt; 63         : 0        |                                        |
        64 -&gt; 127        : 0        |                                        |
       128 -&gt; 255        : 0        |                                        |
       256 -&gt; 511        : 0        |                                        |
       512 -&gt; 1023       : 0        |                                        |
      1024 -&gt; 2047       : 0        |                                        |
      2048 -&gt; 4095       : 0        |                                        |
      4096 -&gt; 8191       : 0        |                                        |
      8192 -&gt; 16383      : 0        |                                        |
     16384 -&gt; 32767      : 66       |****************************************|
     32768 -&gt; 65535      : 0        |                                        |
     65536 -&gt; 131071     : 0        |                                        |
    131072 -&gt; 262143     : 2        |*                                       |
Algn size: 14 - 66
Algn size: 17 - 2

Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Link: https://lore.kernel.org/r/20250221223823.1680616-9-mcgrof@kernel.org
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
