<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/lib/sbitmap.c, branch v6.2-rc8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.2-rc8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.2-rc8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-12-13T18:43:59Z</updated>
<entry>
<title>Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux</title>
<updated>2022-12-13T18:43:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-13T18:43:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce8a79d5601aab94c02ed4539c48e8605422ac94'/>
<id>urn:sha1:ce8a79d5601aab94c02ed4539c48e8605422ac94</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - NVMe pull requests via Christoph:
      - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan
        Joshi)
      - Refactor PCIe probing and reset (Christoph Hellwig)
      - Various fabrics authentication fixes and improvements (Sagi
        Grimberg)
      - Avoid fallback to sequential scan due to transient issues (Uday
        Shankar)
      - Implement support for the DEAC bit in Write Zeroes (Christoph
        Hellwig)
      - Allow overriding the IEEE OUI and firmware revision in configfs
        for nvmet (Aleksandr Miloserdov)
      - Force reconnect when number of queue changes in nvmet (Daniel
        Wagner)
      - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi
        Grimberg, Christoph Hellwig, Christophe JAILLET)
      - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni)
      - Use the common tagset helpers in nvme-pci driver (Christoph
        Hellwig)
      - Cleanup the nvme-pci removal path (Christoph Hellwig)
      - Use kstrtobool() instead of strtobool (Christophe JAILLET)
      - Allow unprivileged passthrough of Identify Controller (Joel
        Granados)
      - Support io stats on the mpath device (Sagi Grimberg)
      - Minor nvmet cleanup (Sagi Grimberg)

 - MD pull requests via Song:
      - Code cleanups (Christoph)
      - Various fixes

 - Floppy pull request from Denis:
      - Fix a memory leak in the init error path (Yuan)

 - Series fixing some batch wakeup issues with sbitmap (Gabriel)

 - Removal of the pktcdvd driver that was deprecated more than 5 years
   ago, and subsequent removal of the devnode callback in struct
   block_device_operations as no users are now left (Greg)

 - Fix for partition read on an exclusively opened bdev (Jan)

 - Series of elevator API cleanups (Jinlong, Christoph)

 - Series of fixes and cleanups for blk-iocost (Kemeng)

 - Series of fixes and cleanups for blk-throttle (Kemeng)

 - Series adding concurrent support for sync queues in BFQ (Yu)

 - Series bringing drbd a bit closer to the out-of-tree maintained
   version (Christian, Joel, Lars, Philipp)

 - Misc drbd fixes (Wang)

 - blk-wbt fixes and tweaks for enable/disable (Yu)

 - Fixes for mq-deadline for zoned devices (Damien)

 - Add support for read-only and offline zones for null_blk
   (Shin'ichiro)

 - Series fixing the delayed holder tracking, as used by DM (Yu,
   Christoph)

 - Series enabling bio alloc caching for IRQ based IO (Pavel)

 - Series enabling userspace peer-to-peer DMA (Logan)

 - BFQ waker fixes (Khazhismel)

 - Series fixing elevator refcount issues (Christoph, Jinlong)

 - Series cleaning up references around queue destruction (Christoph)

 - Series doing quiesce by tagset, enabling cleanups in drivers
   (Christoph, Chao)

 - Series untangling the queue kobject and queue references (Christoph)

 - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye,
   Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph)

* tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits)
  blktrace: Fix output non-blktrace event when blk_classic option enabled
  block: sed-opal: Don't include &lt;linux/kernel.h&gt;
  sed-opal: allow using IOC_OPAL_SAVE for locking too
  blk-cgroup: Fix typo in comment
  block: remove bio_set_op_attrs
  nvmet: don't open-code NVME_NS_ATTR_RO enumeration
  nvme-pci: use the tagset alloc/free helpers
  nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set
  nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers
  nvme: consolidate setting the tagset flags
  nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set
  block: bio_copy_data_iter
  nvme-pci: split out a nvme_pci_ctrl_is_dead helper
  nvme-pci: return early on ctrl state mismatch in nvme_reset_work
  nvme-pci: rename nvme_disable_io_queues
  nvme-pci: cleanup nvme_suspend_queue
  nvme-pci: remove nvme_pci_disable
  nvme-pci: remove nvme_disable_admin_queue
  nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl
  nvme: use nvme_wait_ready in nvme_shutdown_ctrl
  ...
</content>
</entry>
<entry>
<title>treewide: use get_random_u32_below() instead of deprecated function</title>
<updated>2022-11-18T01:15:15Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-10T02:44:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8032bf1233a74627ce69b803608e650f3f35971c'/>
<id>urn:sha1:8032bf1233a74627ce69b803608e650f3f35971c</id>
<content type='text'>
This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
  (E)

Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # for xfs
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt; # for damon
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt; # for infiniband
Reviewed-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt; # for arm
Acked-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt; # for mmc
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>sbitmap: Try each queue to wake up at least one waiter</title>
<updated>2022-11-16T18:33:03Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2022-11-15T22:45:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=26edb30dd1c0c9be11fa676b4f330ada7b794ba6'/>
<id>urn:sha1:26edb30dd1c0c9be11fa676b4f330ada7b794ba6</id>
<content type='text'>
Jan reported the new algorithm as merged might be problematic if the
queue being awaken becomes empty between the waitqueue_active inside
sbq_wake_ptr check and the wake up.  If that happens, wake_up_nr will
not wake up any waiter and we loose too many wake ups.  In order to
guarantee progress, we need to wake up at least one waiter here, if
there are any.  This now requires trying to wake up from every queue.

Instead of walking through all the queues with sbq_wake_ptr, this call
moves the wake up inside that function.  In a previous version of the
patch, I found that updating wake_index several times when walking
through queues had a measurable overhead.  This ensures we only update
it once, at the end.

Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
Reported-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: Advance the queue index before waking up a queue</title>
<updated>2022-11-16T18:33:03Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2022-11-15T22:45:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=976570b4ecd30d3ec6e1b0910da8e5edc591f2b6'/>
<id>urn:sha1:976570b4ecd30d3ec6e1b0910da8e5edc591f2b6</id>
<content type='text'>
When a queue is awaken, the wake_index written by sbq_wake_ptr currently
keeps pointing to the same queue.  On the next wake up, it will thus
retry the same queue, which is unfair to other queues, and can lead to
starvation.  This patch, moves the index update to happen before the
queue is returned, such that it will now try a different queue first on
the next wake up, improving fairness.

Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
Reported-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: Use single per-bitmap counting to wake up queued tags</title>
<updated>2022-11-11T15:38:29Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2022-11-05T23:10:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4f8126bb2308066b877859e4b5923ffb54143630'/>
<id>urn:sha1:4f8126bb2308066b877859e4b5923ffb54143630</id>
<content type='text'>
sbitmap suffers from code complexity, as demonstrated by recent fixes,
and eventual lost wake ups on nested I/O completion.  The later happens,
from what I understand, due to the non-atomic nature of the updates to
wait_cnt, which needs to be subtracted and eventually reset when equal
to zero.  This two step process can eventually miss an update when a
nested completion happens to interrupt the CPU in between the wait_cnt
updates.  This is very hard to fix, as shown by the recent changes to
this code.

The code complexity arises mostly from the corner cases to avoid missed
wakes in this scenario.  In addition, the handling of wake_batch
recalculation plus the synchronization with sbq_queue_wake_up is
non-trivial.

This patchset implements the idea originally proposed by Jan [1], which
removes the need for the two-step updates of wait_cnt.  This is done by
tracking the number of completions and wakeups in always increasing,
per-bitmap counters.  Instead of having to reset the wait_cnt when it
reaches zero, we simply keep counting, and attempt to wake up N threads
in a single wait queue whenever there is enough space for a batch.
Waking up less than batch_wake shouldn't be a problem, because we
haven't changed the conditions for wake up, and the existing batch
calculation guarantees at least enough remaining completions to wake up
a batch for each queue at any time.

Performance-wise, one should expect very similar performance to the
original algorithm for the case where there is no queueing.  In both the
old algorithm and this implementation, the first thing is to check
ws_active, which bails out if there is no queueing to be managed. In the
new code, we took care to avoid accounting completions and wakeups when
there is no queueing, to not pay the cost of atomic operations
unnecessarily, since it doesn't skew the numbers.

For more interesting cases, where there is queueing, we need to take
into account the cross-communication of the atomic operations.  I've
been benchmarking by running parallel fio jobs against a single hctx
nullb in different hardware queue depth scenarios, and verifying both
IOPS and queueing.

Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
varying only the hardware queue length per test.

queue size 2                 4                 8                 16                 32                 64
6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)

The following is a similar experiment, ran against a nullb with a single
bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
parallel fio jobs operating on the same device

queue size 2 	             4                 8              	16             	    32		       64
6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)

It has also survived blktests and a 12h-stress run against nullb. I also
ran the code against nvme and a scsi SSD, and I didn't observe
performance regression in those. If there are other tests you think I
should run, please let me know and I will follow up with results.

[1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/

Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Keith Busch &lt;kbusch@kernel.org&gt;
Cc: Liu Song &lt;liusong@linux.alibaba.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>treewide: use prandom_u32_max() when possible, part 2</title>
<updated>2022-10-11T23:42:58Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-05T14:43:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8b3ccbc1f1f91847160951aa15dd27c22dddcb49'/>
<id>urn:sha1:8b3ccbc1f1f91847160951aa15dd27c22dddcb49</id>
<content type='text'>
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done by hand, covering things that coccinelle could not do on its own.

Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt; # for ext2, ext4, and sbitmap
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>treewide: use prandom_u32_max() when possible, part 1</title>
<updated>2022-10-11T23:42:55Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-05T14:43:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=81895a65ec63ee1daec3255dc1a06675d2fbe915'/>
<id>urn:sha1:81895a65ec63ee1daec3255dc1a06675d2fbe915</id>
<content type='text'>
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() &amp; ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() &gt;&gt; 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() &amp; ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p &amp; (LITERAL))

// Add one to the literal.
@script:python add_one@
literal &lt;&lt; literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value &amp; (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p &amp; (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: KP Singh &lt;kpsingh@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt; # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt; # for drbd
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Acked-by: Heiko Carstens &lt;hca@linux.ibm.com&gt; # for s390
Acked-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt; # for mmc
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # for xfs
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix lockup while swapping</title>
<updated>2022-09-29T23:58:17Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2022-09-29T19:50:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0'/>
<id>urn:sha1:30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0</id>
<content type='text'>
Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
is a big improvement: without it, I had to revert to before commit
040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
to avoid the high system time and freezes which that had introduced.

Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
swapping (kernel builds in low memory) on another: soon locking up in
sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
showing the common pattern of outer sbitmap_queue_wake_up() interrupted
before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
are idle, in other cases they're spinning for a lock in dd_bio_merge()):

sbitmap_queue_wake_up &lt; sbitmap_queue_clear &lt; blk_mq_put_tag &lt;
__blk_mq_free_request &lt; blk_mq_free_request &lt; __blk_mq_end_request &lt;
scsi_end_request &lt; scsi_io_completion &lt; scsi_finish_command &lt;
scsi_complete &lt; blk_complete_reqs &lt; blk_done_softirq &lt; __do_softirq &lt;
__irq_exit_rcu &lt; irq_exit_rcu &lt; common_interrupt &lt; asm_common_interrupt &lt;
_raw_spin_unlock_irqrestore &lt; __wake_up_common_lock &lt; __wake_up &lt;
sbitmap_queue_wake_up &lt; sbitmap_queue_clear &lt; blk_mq_put_tag &lt;
__blk_mq_free_request &lt; blk_mq_free_request &lt; dd_bio_merge &lt;
blk_mq_sched_bio_merge &lt; blk_mq_attempt_bio_merge &lt; blk_mq_submit_bio &lt;
__submit_bio &lt; submit_bio_noacct_nocheck &lt; submit_bio_noacct &lt;
submit_bio &lt; __swap_writepage &lt; swap_writepage &lt; pageout &lt;
shrink_folio_list &lt; evict_folios &lt; lru_gen_shrink_lruvec &lt;
shrink_lruvec &lt; shrink_node &lt; do_try_to_free_pages &lt; try_to_free_pages &lt;
__alloc_pages_slowpath &lt; __alloc_pages &lt; folio_alloc &lt; vma_alloc_folio &lt;
do_anonymous_page &lt; __handle_mm_fault &lt; handle_mm_fault &lt;
do_user_addr_fault &lt; exc_page_fault &lt; asm_exc_page_fault

See how the process-context sbitmap_queue_wake_up() has been interrupted,
after bringing wait_cnt down to 0 (and in this example, after doing its
wakeups), before advancing wake_index and refilling wake_cnt: an
interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.

I have almost no grasp of all the possible sbitmap races, and their
consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
so it is better if sbq_wake_ptr() skips on to the next ws in that case:
which fixes the lockup and shows no adverse consequence for me.

The check for wait_cnt being 0 is obviously racy, and ultimately can lead
to lost wakeups: for example, when there is only a single waitqueue with
waiters.  However, lost wakeups are unlikely to matter in these cases,
and a proper fix requires redesign (and benchmarking) of the batched
wakeup code: so let's plug the hole with this bandaid for now.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix batched wait_cnt accounting</title>
<updated>2022-09-12T06:10:34Z</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2022-09-09T18:40:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4acb83417cadfdcbe64215f9d0ddcf3132af808e'/>
<id>urn:sha1:4acb83417cadfdcbe64215f9d0ddcf3132af808e</id>
<content type='text'>
Batched completions can clear multiple bits, but we're only decrementing
the wait_cnt by one each time. This can cause waiters to never be woken,
stalling IO. Use the batched count instead.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch</title>
<updated>2022-09-08T15:22:42Z</updated>
<author>
<name>Uros Bizjak</name>
<email>ubizjak@gmail.com</email>
</author>
<published>2022-09-08T15:12:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa'/>
<id>urn:sha1:c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa</id>
<content type='text'>
Use atomic_long_try_cmpxchg instead of
atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch.
x86 CMPXCHG instruction returns success in ZF flag, so this change
saves a compare after cmpxchg (and related move instruction in front
of cmpxchg).

Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old"
when cmpxchg fails, enabling further code simplifications, e.g.
an extra memory read can be avoided in the loop.

No functional change intended.

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
