<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block, branch v5.4.224</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.224</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.224'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-11-10T16:57:54Z</updated>
<entry>
<title>block, bfq: protect 'bfqd-&gt;queued' by 'bfqd-&gt;lock'</title>
<updated>2022-11-10T16:57:54Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2022-05-13T02:35:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c494ae149858b07d6df7614445961d34c82e461f'/>
<id>urn:sha1:c494ae149858b07d6df7614445961d34c82e461f</id>
<content type='text'>
commit 181490d5321806e537dc5386db5ea640b826bf78 upstream.

If bfq_schedule_dispatch() is called from bfq_idle_slice_timer_body(),
then 'bfqd-&gt;queued' is read without holding 'bfqd-&gt;lock'. This is
wrong since it can be wrote concurrently.

Fix the problem by holding 'bfqd-&gt;lock' in such case.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220513023507.2625717-2-yukuai3@huawei.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Khazhy Kumykov &lt;khazhy@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: don't create hctx debugfs dir until q-&gt;debugfs_dir is created</title>
<updated>2022-08-25T09:17:36Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2022-07-11T09:08:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=52e1f85bf7deb9b23a4d1d7536d8290da2b9fa31'/>
<id>urn:sha1:52e1f85bf7deb9b23a4d1d7536d8290da2b9fa31</id>
<content type='text'>
[ Upstream commit f3ec5d11554778c24ac8915e847223ed71d104fc ]

blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues
when gendisk isn't added yet, such as nvme tcp.

Fixes the warning of 'debugfs: Directory 'hctx0' with parent '/' already present!'
which can be observed reliably when running blktests nvme/005.

Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions")
Reported-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Tested-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220711090808.259682-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: fix bio_clone_blkg_association() to associate with proper blkcg_gq</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f62b21b7c93ad0c0b77585685ed906c919c437f'/>
<id>urn:sha1:5f62b21b7c93ad0c0b77585685ed906c919c437f</id>
<content type='text'>
commit 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 upstream.

Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio-&gt;bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.

Reported-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Reported-and-tested-by: Donald Buczek &lt;buczek@molgen.mpg.de&gt;
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Make sure bfqg for which we are queueing requests is online</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ccddf8cd411c1800863ed357064e56ceffd356bb'/>
<id>urn:sha1:ccddf8cd411c1800863ed357064e56ceffd356bb</id>
<content type='text'>
commit 075a53b78b815301f8d3dd1ee2cd99554e34f0dd upstream.

Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.

CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Get rid of __bio_blkcg() usage</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8afc13b958bd194553cb8bee22bf7522e2f9e6ac'/>
<id>urn:sha1:8afc13b958bd194553cb8bee22bf7522e2f9e6ac</id>
<content type='text'>
commit 4e54a2493e582361adc3bfbf06c7d50d19d18837 upstream.

BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
would not be associated with any blkcg, the usage of __bio_blkcg() in
BFQ is prone to races with the task being migrated between cgroups as
__bio_blkcg() calls at different places could return different blkcgs.

Convert BFQ to the new situation where bio-&gt;bi_blkg is initialized in
bio_set_dev() and thus practically always valid. This allows us to save
blkcg_gq lookup and noticeably simplify the code.

CC: stable@vger.kernel.org
Fixes: 0fe061b9f03c ("blkcg: fix ref count issue with bio_blkcg() using task_css")
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Remove pointless bfq_init_rq() calls</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=be1b78f949922550ff05b51b57d5b30672ee086a'/>
<id>urn:sha1:be1b78f949922550ff05b51b57d5b30672ee086a</id>
<content type='text'>
commit 5f550ede5edf846ecc0067be1ba80514e6fe7f8e upstream.

We call bfq_init_rq() from request merging functions where requests we
get should have already gone through bfq_init_rq() during insert and
anyway we want to do anything only if the request is already tracked by
BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
skip requests untracked by BFQ. We move bfq_init_rq() call in
bfq_insert_request() a bit earlier to cover request merging and thus
can transfer FIFO position in case of a merge.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Drop pointless unlock-lock pair</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f885f55033a189a29ede6761a32aa7a2e6030cdb'/>
<id>urn:sha1:f885f55033a189a29ede6761a32aa7a2e6030cdb</id>
<content type='text'>
commit fc84e1f941b91221092da5b3102ec82da24c5673 upstream.

In bfq_insert_request() we unlock bfqd-&gt;lock only to call
trace_block_rq_insert() and then lock bfqd-&gt;lock again. This is really
pointless since tracing is disabled if we really care about performance
and even if the tracepoint is enabled, it is a quick call.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Avoid merging queues with different parents</title>
<updated>2022-06-14T16:11:50Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-06-07T09:15:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=97be7d13fbd4001eeab49b1be6399f23a8c66160'/>
<id>urn:sha1:97be7d13fbd4001eeab49b1be6399f23a8c66160</id>
<content type='text'>
commit c1cee4ab36acef271be9101590756ed0c0c374d9 upstream.

It can happen that the parent of a bfqq changes between the moment we
decide two queues are worth to merge (and set bic-&gt;stable_merge_bfqq)
and the moment bfq_setup_merge() is called. This can happen e.g. because
the process submitted IO for a different cgroup and thus bfqq got
reparented. It can even happen that the bfqq we are merging with has
parent cgroup that is already offline and going to be destroyed in which
case the merge can lead to use-after-free issues such as:

BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544

CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G            E     5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
 &lt;IRQ&gt;
 dump_stack_lvl+0x46/0x5a
 print_address_description.constprop.0+0x1f/0x140
 ? __bfq_deactivate_entity+0x9cb/0xa50
 kasan_report.cold+0x7f/0x11b
 ? __bfq_deactivate_entity+0x9cb/0xa50
 __bfq_deactivate_entity+0x9cb/0xa50
 ? update_curr+0x32f/0x5d0
 bfq_deactivate_entity+0xa0/0x1d0
 bfq_del_bfqq_busy+0x28a/0x420
 ? resched_curr+0x116/0x1d0
 ? bfq_requeue_bfqq+0x70/0x70
 ? check_preempt_wakeup+0x52b/0xbc0
 __bfq_bfqq_expire+0x1a2/0x270
 bfq_bfqq_expire+0xd16/0x2160
 ? try_to_wake_up+0x4ee/0x1260
 ? bfq_end_wr_async_queues+0xe0/0xe0
 ? _raw_write_unlock_bh+0x60/0x60
 ? _raw_spin_lock_irq+0x81/0xe0
 bfq_idle_slice_timer+0x109/0x280
 ? bfq_dispatch_request+0x4870/0x4870
 __hrtimer_run_queues+0x37d/0x700
 ? enqueue_hrtimer+0x1b0/0x1b0
 ? kvm_clock_get_cycles+0xd/0x10
 ? ktime_get_update_offsets_now+0x6f/0x280
 hrtimer_interrupt+0x2c8/0x740

Fix the problem by checking that the parent of the two bfqqs we are
merging in bfq_setup_merge() is the same.

Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
CC: stable@vger.kernel.org
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>blk-iolatency: Fix inflight count imbalances and IO hangs on offline</title>
<updated>2022-06-14T16:11:49Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-05-14T06:55:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d19fa8f252000d141f9199ca32959c50314e1f05'/>
<id>urn:sha1:d19fa8f252000d141f9199ca32959c50314e1f05</id>
<content type='text'>
commit 8a177a36da6c54c98b8685d4f914cb3637d53c0d upstream.

iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.

Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.

This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.

The following keeps flipping on and off iolatency on sda:

  echo +io &gt; /sys/fs/cgroup/cgroup.subtree_control
  while true; do
      mkdir -p /sys/fs/cgroup/test
      echo '8:0 target=100000' &gt; /sys/fs/cgroup/test/io.latency
      sleep 1
      rmdir /sys/fs/cgroup/test
      sleep 1
  done

and there's concurrent fio generating direct rand reads:

  fio --name test --filename=/dev/sda --direct=1 --rw=randread \
      --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k

while monitoring with the following drgn script:

  while True:
    for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
        for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
            blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
            pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
            if pd.value_() == 0:
                continue
            iolat = container_of(pd, 'struct iolatency_grp', 'pd')
            inflight = iolat.rq_wait.inflight.counter.value_()
            if inflight:
                print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
                      f'{cgroup_path(css.cgroup).decode("utf-8")}')
    time.sleep(1)

The monitoring output looks like the following:

  inflight=1 sda /user.slice
  inflight=1 sda /user.slice
  ...
  inflight=14 sda /user.slice
  inflight=13 sda /user.slice
  inflight=17 sda /user.slice
  inflight=15 sda /user.slice
  inflight=18 sda /user.slice
  inflight=17 sda /user.slice
  inflight=20 sda /user.slice
  inflight=19 sda /user.slice &lt;- fio stopped, inflight stuck at 19
  inflight=19 sda /user.slice
  inflight=19 sda /user.slice

If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.

This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.

This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Liu Bo &lt;bo.liu@linux.alibaba.com&gt;
Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter")
Cc: stable@vger.kernel.org # v5.0+
Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bfq: Track whether bfq_group is still online</title>
<updated>2022-06-14T16:11:45Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-04-01T10:27:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28a8060a0bd2c636788d248d722a67a306dd0189'/>
<id>urn:sha1:28a8060a0bd2c636788d248d722a67a306dd0189</id>
<content type='text'>
commit 09f871868080c33992cd6a9b72a5ca49582578fa upstream.

Track whether bfq_group is still online. We cannot rely on
blkcg_gq-&gt;online because that gets cleared only after all policies are
offlined and we need something that gets updated already under
bfqd-&gt;lock when we are cleaning up our bfq_group to be able to guarantee
that when we see online bfq_group, it will stay online while we are
holding bfqd-&gt;lock lock.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
