<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/blkdev.h, branch v4.16.14</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.16.14</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.16.14'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-05-01T19:47:24Z</updated>
<entry>
<title>scsi: sd_zbc: Avoid that resetting a zone fails sporadically</title>
<updated>2018-05-01T19:47:24Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@wdc.com</email>
</author>
<published>2018-04-17T01:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=254b8fd89053489a74cc1f30d324c2be10205cfc'/>
<id>urn:sha1:254b8fd89053489a74cc1f30d324c2be10205cfc</id>
<content type='text'>
commit ccce20fc7968d546fb1e8e147bf5cdc8afc4278a upstream.

Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is
called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones().  That can
cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets
q-&gt;nr_zones to zero before restoring it to the actual value, even if no drive
characteristics have changed.  Avoid that this can happen by making the
following changes:

- Protect the code that updates zone information with blk_queue_enter()
  and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
  these functions do not modify struct scsi_disk before all zone
  information has been obtained.

Note: since commit 055f6e18e08f ("block: Make q_usage_counter also track
legacy requests"; kernel v4.15) the request queue freezing mechanism also
affects legacy request queues.

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Signed-off-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: stable@vger.kernel.org # v4.16
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: fix a typo in comment of BLK_MQ_POLL_STATS_BKTS</title>
<updated>2018-02-15T15:27:06Z</updated>
<author>
<name>Minwoo Im</name>
<email>minwoo.im.dev@gmail.com</email>
</author>
<published>2018-02-15T14:53:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=096392e0714d3a520366ba467e215edf7280acff'/>
<id>urn:sha1:096392e0714d3a520366ba467e215edf7280acff</id>
<content type='text'>
Update comment typo _consisitent_ to _consistent_ from following commit.
commit 0206319fdfee ("blk-mq: Fix poll_stat for new size-based bucketing.")

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Minwoo Im &lt;minwoo.im.dev@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block</title>
<updated>2018-01-29T19:51:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-01-29T19:51:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0'/>
<id>urn:sha1:0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0</id>
<content type='text'>
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
</content>
</entry>
<entry>
<title>block: Remove kblockd_schedule_delayed_work{,_on}()</title>
<updated>2018-01-19T19:52:03Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@wdc.com</email>
</author>
<published>2018-01-19T16:58:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5ced52aaa5494c1feb9f80252cb2a2cde0dace8'/>
<id>urn:sha1:f5ced52aaa5494c1feb9f80252cb2a2cde0dace8</id>
<content type='text'>
The previous patch removed all users of these two functions. Hence
also remove the functions themselves.

Reviewed-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: rearrange a few request fields for better cache layout</title>
<updated>2018-01-10T18:47:58Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-01-10T18:46:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7c3fb70f0341f9d924818e648906774921f4bcb3'/>
<id>urn:sha1:7c3fb70f0341f9d924818e648906774921f4bcb3</id>
<content type='text'>
Move completion related items (like the call single data) near the
end of the struct, instead of mixing them in with the initial
queueing related fields.

Move queuelist below the bio structures. Then we have all
queueing related bits in the first cache line.

This yields a 1.5-2% increase in IOPS for a null_blk test, both for
sync and for high thread count access. Sync test goes form 975K to
992K, 32-thread case from 20.8M to 21.2M IOPS.

Reviewed-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: convert REQ_ATOM_COMPLETE to stealing rq-&gt;__deadline bit</title>
<updated>2018-01-10T18:47:53Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-01-10T18:34:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e14575b3d457f5806d79b85886ef94d9c29e3b2a'/>
<id>urn:sha1:e14575b3d457f5806d79b85886ef94d9c29e3b2a</id>
<content type='text'>
We only have one atomic flag left. Instead of using an entire
unsigned long for that, steal the bottom bit of the deadline
field that we already reserved.

Remove -&gt;atomic_flags, since it's now unused.

Reviewed-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: add accessors for setting/querying request deadline</title>
<updated>2018-01-10T18:47:47Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-01-09T21:23:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a72e7f44964b9ada3e5c15820372e9cb119bf80'/>
<id>urn:sha1:0a72e7f44964b9ada3e5c15820372e9cb119bf80</id>
<content type='text'>
We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.

Reviewed-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove REQ_ATOM_POLL_SLEPT</title>
<updated>2018-01-10T18:47:43Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-01-10T18:30:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=76a86f9d027b342b8759a4b2f9f7fe046e284220'/>
<id>urn:sha1:76a86f9d027b342b8759a4b2f9f7fe046e284220</id>
<content type='text'>
We don't need this to be an atomic flag, it can be a regular
flag. We either end up on the same CPU for the polling, in which
case the state is sane, or we did the sleep which would imply
the needed barrier to ensure we see the right state.

Reviewed-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq</title>
<updated>2018-01-09T16:31:15Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-01-09T16:29:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=634f9e4631a88025d3b90c1884e9a1b6a13d01d2'/>
<id>urn:sha1:634f9e4631a88025d3b90c1884e9a1b6a13d01d2</id>
<content type='text'>
After the recent updates to use generation number and state based
synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
to avoid firing the same timeout multiple times.

Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
times.  This removes atomic bitops from hot paths too.

v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().

v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "jianchao.wang" &lt;jianchao.w.wang@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: replace timeout synchronization with a RCU and generation based scheme</title>
<updated>2018-01-09T16:31:15Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-01-09T16:29:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d9bd5161ba32db5665a617edc8b0723880f543e'/>
<id>urn:sha1:1d9bd5161ba32db5665a617edc8b0723880f543e</id>
<content type='text'>
Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunately, it contains quite a few holes.

There's a complex dancing around REQ_ATOM_STARTED and
REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
they don't have a synchronization point across request recycle
instances and it isn't clear what the barriers add.
blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
deadline from N-1'th, blk_mark_rq_complete() against Nth instance.

In fact, it's pretty easy to make blk_mq_check_expired() terminate a
later instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patch replaces the broken synchronization mechanism with a RCU
and generation number based one.

1. Each request has a u64 generation + state value, which can be
   updated only by the request owner.  Whenever a request becomes
   in-flight, the generation number gets bumped up too.  This provides
   the basis for the timeout path to distinguish different recycle
   instances of the request.

   Also, marking a request in-flight and setting its deadline are
   protected with a seqcount so that the timeout path can fetch both
   values coherently.

2. The timeout path fetches the generation, state and deadline.  If
   the verdict is timeout, it records the generation into a dedicated
   request abortion field and does RCU wait.

3. The completion path is also protected by RCU (from the previous
   patch) and checks whether the current generation number and state
   match the abortion field.  If so, it skips completion.

4. The timeout path, after RCU wait, scans requests again and
   terminates the ones whose generation and state still match the ones
   requested for abortion.

   By now, the timeout path knows that either the generation number
   and state changed if it lost the race or the completion will yield
   to it and can safely timeout the request.

While it's more lines of code, it's conceptually simpler, doesn't
depend on direct use of subtle memory ordering or coherence, and
hopefully doesn't terminate the wrong instance.

While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
removed yet as it's still used in other places.  Future patches will
move all state tracking to the new mechanism and remove all bitops in
the hot paths.

Note that this patch adds a comment explaining a race condition in
BLK_EH_RESET_TIMER path.  The race has always been there and this
patch doesn't change it.  It's just documenting the existing race.

v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
    - s/request-&gt;gstate_seqc/request-&gt;gstate_seq/ as suggested by Peter.
    - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.

v3: - Fixed possible extended seqcount / u64_stats_sync read looping
      spotted by Peter.
    - MQ_RQ_IDLE was incorrectly being set in complete_request instead
      of free_request.  Fixed.

v4: - Rebased on top of hctx_lock() refactoring patch.
    - Added comment explaining the use of hctx_lock() in completion path.

v5: - Added comments requested by Bart.
    - Note the addition of BLK_EH_RESET_TIMER race condition in the
      commit message.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "jianchao.wang" &lt;jianchao.w.wang@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Bart Van Assche &lt;Bart.VanAssche@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
