<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block, branch v4.19.278</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.278</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.278'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-03-11T15:31:33Z</updated>
<entry>
<title>block: bio-integrity: Copy flags when bio_integrity_payload is cloned</title>
<updated>2023-03-11T15:31:33Z</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2023-02-15T17:18:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c51759f10543db85db042690e9656bedc3a032bd'/>
<id>urn:sha1:c51759f10543db85db042690e9656bedc3a032bd</id>
<content type='text'>
[ Upstream commit b6a4bdcda430e3ca43bbb9cb1d4d4d34ebe15c40 ]

Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.

Fixes: aae7df50190a ("block: Integrity checksum flag")
Fixes: b1f01388574c ("block: Relocate bio integrity flags")
Reported-by: Saurav Kashyap &lt;skashyap@marvell.com&gt;
Tested-by: Saurav Kashyap &lt;skashyap@marvell.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230215171801.21062-1-martin.petersen@oracle.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx</title>
<updated>2023-03-11T15:31:33Z</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2023-01-18T09:37:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1c13779dc00907c7ab57a0c8747c0d35a98f8a1a'/>
<id>urn:sha1:1c13779dc00907c7ab57a0c8747c0d35a98f8a1a</id>
<content type='text'>
[ Upstream commit c31e76bcc379182fe67a82c618493b7b8868c672 ]

Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: fix and cleanup bio_check_ro</title>
<updated>2023-02-06T06:49:41Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-03-04T18:00:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cc42c752fee620e43d34f000c23037d463a3f488'/>
<id>urn:sha1:cc42c752fee620e43d34f000c23037d463a3f488</id>
<content type='text'>
commit 57e95e4670d1126c103305bcf34a9442f49f6d6a upstream.

Don't use a WARN_ON when printing a potentially user triggered
condition.  Also don't print the partno when the block device name
already includes it, and use the %pg specifier to simplify printing
the block device name.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Link: https://lore.kernel.org/r/20220304180105.409765-2-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Fedor Pchelkin &lt;pchelkin@ispras.ru&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix possible memleak when register 'hctx' failed</title>
<updated>2023-01-18T10:30:37Z</updated>
<author>
<name>Ye Bin</name>
<email>yebin10@huawei.com</email>
</author>
<published>2022-11-17T02:29:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=02bc8bc6eab03c84373281b85cb6e98747172ff7'/>
<id>urn:sha1:02bc8bc6eab03c84373281b85cb6e98747172ff7</id>
<content type='text'>
[ Upstream commit 4b7a21c57b14fbcd0e1729150189e5933f5088e9 ]

There's issue as follows when do fault injection test:
unreferenced object 0xffff888132a9f400 (size 512):
  comm "insmod", pid 308021, jiffies 4324277909 (age 509.733s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 08 f4 a9 32 81 88 ff ff  ...........2....
    08 f4 a9 32 81 88 ff ff 00 00 00 00 00 00 00 00  ...2............
  backtrace:
    [&lt;00000000e8952bb4&gt;] kmalloc_node_trace+0x22/0xa0
    [&lt;00000000f9980e0f&gt;] blk_mq_alloc_and_init_hctx+0x3f1/0x7e0
    [&lt;000000002e719efa&gt;] blk_mq_realloc_hw_ctxs+0x1e6/0x230
    [&lt;000000004f1fda40&gt;] blk_mq_init_allocated_queue+0x27e/0x910
    [&lt;00000000287123ec&gt;] __blk_mq_alloc_disk+0x67/0xf0
    [&lt;00000000a2a34657&gt;] 0xffffffffa2ad310f
    [&lt;00000000b173f718&gt;] 0xffffffffa2af824a
    [&lt;0000000095a1dabb&gt;] do_one_initcall+0x87/0x2a0
    [&lt;00000000f32fdf93&gt;] do_init_module+0xdf/0x320
    [&lt;00000000cbe8541e&gt;] load_module+0x3006/0x3390
    [&lt;0000000069ed1bdb&gt;] __do_sys_finit_module+0x113/0x1b0
    [&lt;00000000a1a29ae8&gt;] do_syscall_64+0x35/0x80
    [&lt;000000009cd878b0&gt;] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fault injection context as follows:
 kobject_add
 blk_mq_register_hctx
 blk_mq_sysfs_register
 blk_register_queue
 device_add_disk
 null_add_dev.part.0 [null_blk]

As 'blk_mq_register_hctx' may already add some objects when failed halfway,
but there isn't do fallback, caller don't know which objects add failed.
To solve above issue just do fallback when add objects failed halfway in
'blk_mq_register_hctx'.

Signed-off-by: Ye Bin &lt;yebin10@huawei.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20221117022940.873959-1-yebin@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: unhash blkdev part inode when the part is deleted</title>
<updated>2023-01-18T10:29:59Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2022-12-13T07:16:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cc019421d037864fe0a7d34ac091c24f48036f62'/>
<id>urn:sha1:cc019421d037864fe0a7d34ac091c24f48036f62</id>
<content type='text'>
v5.11 changes the blkdev lookup mechanism completely since commit
22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"),
and small part of the change is to unhash part bdev inode when
deleting partition. Turns out this kind of change does fix one
nasty issue in case of BLOCK_EXT_MAJOR:

1) when one partition is deleted &amp; closed, disk_put_part() is always
called before bdput(bdev), see blkdev_put(); so the part's devt can
be freed &amp; re-used before the inode is dropped

2) then new partition with same devt can be created just before the
inode in 1) is dropped, then the old inode/bdev structurein 1) is
re-used for this new partition, this way causes use-after-free and
kernel panic.

It isn't possible to backport the whole big patchset of "merge struct
block_device and struct hd_struct v4" for addressing this issue.

https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/

So fixes it by unhashing part bdev in delete_partition(), and this way
is actually aligned with v5.11+'s behavior.

Backported from the following 5.10.y commit:

5f2f77560591 ("block: unhash blkdev part inode when the part is deleted")

Reported-by: Shiwei Cui &lt;cuishw@inspur.com&gt;
Tested-by: Shiwei Cui &lt;cuishw@inspur.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: sed-opal: kmalloc the cmd/resp buffers</title>
<updated>2022-11-25T16:40:23Z</updated>
<author>
<name>Serge Semin</name>
<email>Sergey.Semin@baikalelectronics.ru</email>
</author>
<published>2022-11-07T20:39:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=88c11cc2e96e93be5018d4678559bfe56c211420'/>
<id>urn:sha1:88c11cc2e96e93be5018d4678559bfe56c211420</id>
<content type='text'>
[ Upstream commit f829230dd51974c1f4478900ed30bb77ba530b40 ]

In accordance with [1] the DMA-able memory buffers must be
cacheline-aligned otherwise the cache writing-back and invalidation
performed during the mapping may cause the adjacent data being lost. It's
specifically required for the DMA-noncoherent platforms [2]. Seeing the
opal_dev.{cmd,resp} buffers are implicitly used for DMAs in the NVME and
SCSI/SD drivers in framework of the nvme_sec_submit() and sd_sec_submit()
methods respectively they must be cacheline-aligned to prevent the denoted
problem. One of the option to guarantee that is to kmalloc the buffers
[2]. Let's explicitly allocate them then instead of embedding into the
opal_dev structure instance.

Note this fix was inspired by the commit c94b7f9bab22 ("nvme-hwmon:
kmalloc the NVME SMART log buffer").

[1] Documentation/core-api/dma-api.rst
[2] Documentation/core-api/dma-api-howto.rst

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
Signed-off-by: Serge Semin &lt;Sergey.Semin@baikalelectronics.ru&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20221107203944.31686-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block, bfq: protect 'bfqd-&gt;queued' by 'bfqd-&gt;lock'</title>
<updated>2022-11-10T16:46:54Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2022-05-13T02:35:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f1eb5f7374ce6c8c378c5d9bb510eb55447ff03'/>
<id>urn:sha1:5f1eb5f7374ce6c8c378c5d9bb510eb55447ff03</id>
<content type='text'>
commit 181490d5321806e537dc5386db5ea640b826bf78 upstream.

If bfq_schedule_dispatch() is called from bfq_idle_slice_timer_body(),
then 'bfqd-&gt;queued' is read without holding 'bfqd-&gt;lock'. This is
wrong since it can be wrote concurrently.

Fix the problem by holding 'bfqd-&gt;lock' in such case.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220513023507.2625717-2-yukuai3@huawei.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Khazhy Kumykov &lt;khazhy@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>blk-iolatency: Fix inflight count imbalances and IO hangs on offline</title>
<updated>2022-06-14T14:59:30Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-05-14T06:55:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=515d077ee3085ae343b6bea7fd031f9906645f38'/>
<id>urn:sha1:515d077ee3085ae343b6bea7fd031f9906645f38</id>
<content type='text'>
commit 8a177a36da6c54c98b8685d4f914cb3637d53c0d upstream.

iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.

Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.

This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.

The following keeps flipping on and off iolatency on sda:

  echo +io &gt; /sys/fs/cgroup/cgroup.subtree_control
  while true; do
      mkdir -p /sys/fs/cgroup/test
      echo '8:0 target=100000' &gt; /sys/fs/cgroup/test/io.latency
      sleep 1
      rmdir /sys/fs/cgroup/test
      sleep 1
  done

and there's concurrent fio generating direct rand reads:

  fio --name test --filename=/dev/sda --direct=1 --rw=randread \
      --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k

while monitoring with the following drgn script:

  while True:
    for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
        for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
            blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
            pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
            if pd.value_() == 0:
                continue
            iolat = container_of(pd, 'struct iolatency_grp', 'pd')
            inflight = iolat.rq_wait.inflight.counter.value_()
            if inflight:
                print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
                      f'{cgroup_path(css.cgroup).decode("utf-8")}')
    time.sleep(1)

The monitoring output looks like the following:

  inflight=1 sda /user.slice
  inflight=1 sda /user.slice
  ...
  inflight=14 sda /user.slice
  inflight=13 sda /user.slice
  inflight=17 sda /user.slice
  inflight=15 sda /user.slice
  inflight=18 sda /user.slice
  inflight=17 sda /user.slice
  inflight=20 sda /user.slice
  inflight=19 sda /user.slice &lt;- fio stopped, inflight stuck at 19
  inflight=19 sda /user.slice
  inflight=19 sda /user.slice

If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.

This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.

This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Liu Bo &lt;bo.liu@linux.alibaba.com&gt;
Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter")
Cc: stable@vger.kernel.org # v5.0+
Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern</title>
<updated>2022-06-06T06:24:21Z</updated>
<author>
<name>Haimin Zhang</name>
<email>tcs.kernel@gmail.com</email>
</author>
<published>2022-02-16T08:40:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=18243d8479fd77952bdb6340024169d30b173a40'/>
<id>urn:sha1:18243d8479fd77952bdb6340024169d30b173a40</id>
<content type='text'>
commit cc8f7fe1f5eab010191aa4570f27641876fa1267 upstream.

Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
the buffer of a bio.

Signed-off-by: Haimin Zhang &lt;tcs.kernel@gmail.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[DP: Backported to 4.19: Manually added __GFP_ZERO flag]
Signed-off-by: Dragos-Marian Panait &lt;dragos.panait@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block/compat_ioctl: fix range check in BLKGETSIZE</title>
<updated>2022-04-27T11:39:45Z</updated>
<author>
<name>Khazhismel Kumykov</name>
<email>khazhy@google.com</email>
</author>
<published>2022-04-14T22:40:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ace51abe0f3a92d512440291f8a47e2cb094ca74'/>
<id>urn:sha1:ace51abe0f3a92d512440291f8a47e2cb094ca74</id>
<content type='text'>
commit ccf16413e520164eb718cf8b22a30438da80ff23 upstream.

kernel ulong and compat_ulong_t may not be same width. Use type directly
to eliminate mismatches.

This would result in truncation rather than EFBIG for 32bit mode for
large disks.

Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Khazhismel Kumykov &lt;khazhy@google.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
