<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block/blk.h, branch v3.2.34</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.34</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.34'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-08-02T13:37:54Z</updated>
<entry>
<title>block: add blk_queue_dead()</title>
<updated>2012-08-02T13:37:54Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-12-13T23:33:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=68e9e9fee2bbec4853a993e98b0df8479292f572'/>
<id>urn:sha1:68e9e9fee2bbec4853a993e98b0df8479292f572</id>
<content type='text'>
commit 34f6055c80285e4efb3f602a9119db75239744dc upstream.

There are a number of QUEUE_FLAG_DEAD tests.  Add blk_queue_dead()
macro and use it.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown</title>
<updated>2011-10-19T12:42:16Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:42:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c9a929dde3913780b5c416f4bb9d9ed804f509ce'/>
<id>urn:sha1:c9a929dde3913780b5c416f4bb9d9ed804f509ce</id>
<content type='text'>
request_queue is refcounted but actually depdends on lifetime
management from the queue owner - on blk_cleanup_queue(), block layer
expects that there's no request passing through request_queue and no
new one will.

This is fundamentally broken.  The queue owner (e.g. SCSI layer)
doesn't have a way to know whether there are other active users before
calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
guarantee that the queue is and would stay valid while it's holding a
reference.

With delay added in blk_queue_bio() before queue_lock is grabbed, the
following oops can be easily triggered when a device is removed with
in-flight IOs.

 sd 0:0:1:0: [sdb] Stopping disk
 ata1.01: disabled
 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
 RIP: 0010:[&lt;ffffffff8137d651&gt;]  [&lt;ffffffff8137d651&gt;] elv_rqhash_find+0x61/0x100
 ...
 Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
 ...
 Call Trace:
  [&lt;ffffffff8137d774&gt;] elv_merge+0x84/0xe0
  [&lt;ffffffff81385b54&gt;] blk_queue_bio+0xf4/0x400
  [&lt;ffffffff813838ea&gt;] generic_make_request+0xca/0x100
  [&lt;ffffffff81383994&gt;] submit_bio+0x74/0x100
  [&lt;ffffffff811c53ec&gt;] dio_bio_submit+0xbc/0xc0
  [&lt;ffffffff811c610e&gt;] __blockdev_direct_IO+0x92e/0xb40
  [&lt;ffffffff811c39f7&gt;] blkdev_direct_IO+0x57/0x60
  [&lt;ffffffff8113b1c5&gt;] generic_file_aio_read+0x6d5/0x760
  [&lt;ffffffff8118c1ca&gt;] do_sync_read+0xda/0x120
  [&lt;ffffffff8118ce55&gt;] vfs_read+0xc5/0x180
  [&lt;ffffffff8118cfaa&gt;] sys_pread64+0x9a/0xb0
  [&lt;ffffffff81afaf6b&gt;] system_call_fastpath+0x16/0x1b

This happens because blk_queue_cleanup() destroys the queue and
elevator whether IOs are in progress or not and DEAD tests are
sprinkled in the request processing path without proper
synchronization.

Similar problem exists for blk-throtl.  On queue cleanup, blk-throtl
is shutdown whether it has requests in it or not.  Depending on
timing, it either oopses or throttled bios are lost putting tasks
which are waiting for bio completion into eternal D state.

The way it should work is having the usual clear distinction between
shutdown and release.  Shutdown drains all currently pending requests,
marks the queue dead, and performs partial teardown of the now
unnecessary part of the queue.  Even after shutdown is complete,
reference holders are still allowed to issue requests to the queue
although they will be immmediately failed.  The rest of teardown
happens on release.

This patch makes the following changes to make blk_queue_cleanup()
behave as proper shutdown.

* QUEUE_FLAG_DEAD is now set while holding both q-&gt;exit_mutex and
  queue_lock.

* Unsynchronized DEAD check in generic_make_request_checks() removed.
  This couldn't make any meaningful difference as the queue could die
  after the check.

* blk_drain_queue() updated such that it can drain all requests and is
  now called during cleanup.

* blk_throtl updated such that it checks DEAD on grabbing queue_lock,
  drains all throttled bios during cleanup and free td when queue is
  released.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: reorganize throtl_get_tg() and blk_throtl_bio()</title>
<updated>2011-10-19T12:33:01Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:33:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc16a4f933bc5ed50826b20561e4c3515061998b'/>
<id>urn:sha1:bc16a4f933bc5ed50826b20561e4c3515061998b</id>
<content type='text'>
blk_throtl_bio() and throtl_get_tg() have rather unusual interface.

* throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV),
  and drops queue_lock in the latter case.  Different locking context
  depending on return value is error-prone and DEAD state is scheduled
  to be protected by queue_lock anyway.  Move DEAD check inside
  queue_lock and return valid tg or NULL.

* blk_throtl_bio() indicates return status both with its return value
  and in/out param **@bio.  The former is used to indicate whether
  queue is found to be dead during throtl processing.  The latter
  whether the bio is throttled.

  There's no point in returning DEAD check result from
  blk_throtl_bio().  The queue can die after blk_throtl_bio() is
  finished but before make_request_fn() grabs queue lock.

  Make it take *@bio instead and return boolean result indicating
  whether the request is throttled or not.

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: reorganize queue draining</title>
<updated>2011-10-19T12:32:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e3c78ca524d230bc145e902625e88c392a58ddf3'/>
<id>urn:sha1:e3c78ca524d230bc145e902625e88c392a58ddf3</id>
<content type='text'>
Reorganize queue draining related code in preparation of queue exit
changes.

* Factor out actual draining from elv_quiesce_start() to
  blk_drain_queue().

* Make elv_quiesce_start/end() responsible for their own locking.

* Replace open-coded ELVSWITCH clearing in elevator_switch() with
  elv_quiesce_end().

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: move blk_throtl prototypes to block/blk.h</title>
<updated>2011-10-19T12:31:18Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:31:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc9fcbf9cb8ec76d340da16fbf48a9a316e14c52'/>
<id>urn:sha1:bc9fcbf9cb8ec76d340da16fbf48a9a316e14c52</id>
<content type='text'>
blk_throtl interface is block internal and there's no reason to have
them in linux/blkdev.h.  Move them to block/blk.h.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix flush machinery for stacking drivers with differring flush flags</title>
<updated>2011-08-15T19:37:25Z</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2011-08-15T19:37:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4853abaae7e4a2af938115ce9071ef8684fb7af4'/>
<id>urn:sha1:4853abaae7e4a2af938115ce9071ef8684fb7af4</id>
<content type='text'>
Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
-               while (!list_empty(&amp;q-&gt;queue_head)) {
+               if (!list_empty(&amp;q-&gt;queue_head)) {
                        rq = list_entry_rq(q-&gt;queue_head.next);
-                       if (!(rq-&gt;cmd_flags &amp; (REQ_FLUSH | REQ_FUA)) ||
-                           (rq-&gt;cmd_flags &amp; REQ_FLUSH_SEQ))
-                               return rq;
-                       rq = blk_do_flush(q, rq);
-                       if (rq)
-                               return rq;
+                       return rq;
                }

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
        unsigned int fflags = q-&gt;flush_flags; /* may change, cache it */
        bool has_flush = fflags &amp; REQ_FLUSH, has_fua = fflags &amp; REQ_FUA;
        bool do_preflush = has_flush &amp;&amp; (rq-&gt;cmd_flags &amp; REQ_FLUSH);
        bool do_postflush = has_flush &amp;&amp; !has_fua &amp;&amp; (rq-&gt;cmd_flags &amp;
        REQ_FUA);
        unsigned skip = 0;
...
        if (blk_rq_sectors(rq) &amp;&amp; !do_preflush &amp;&amp; !do_postflush) {
                rq-&gt;cmd_flags &amp;= ~REQ_FLUSH;
		if (!has_fua)
			rq-&gt;cmd_flags &amp;= ~REQ_FUA;
	        return rq;
	}

So, the flush machinery was bypassed in such cases (q-&gt;flush_flags == 0
&amp;&amp; rq-&gt;cmd_flags &amp; (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all.  Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking.  While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req-&gt;end_io).  Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers.  So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance.  Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' into for-2.6.40/core</title>
<updated>2011-05-20T18:36:16Z</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-05-20T18:36:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0eb8e885726a3a93206510092bbc7e39e272f6ef'/>
<id>urn:sha1:0eb8e885726a3a93206510092bbc7e39e272f6ef</id>
<content type='text'>
This patch merges in a fix that missed 2.6.39 final.

Conflicts:
	block/blk.h
</content>
</entry>
<entry>
<title>Merge commit 'v2.6.39' into for-2.6.40/core</title>
<updated>2011-05-20T18:33:15Z</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-05-20T18:33:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=698567f3fa790fea37509a54dea855302dd88331'/>
<id>urn:sha1:698567f3fa790fea37509a54dea855302dd88331</id>
<content type='text'>
Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
had churn in the core area that makes it difficult to handle
patches for eg cfq or blk-throttle. Instead of requiring that they
be based in older versions with bugs that have been fixed later
in the rc cycle, merge in 2.6.39 final.

Also fixes up conflicts in the below files.

Conflicts:
	drivers/block/paride/pcd.c
	drivers/cdrom/viocd.c
	drivers/ide/ide-cd.c

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: add proper state guards to __elv_next_request</title>
<updated>2011-05-18T17:30:32Z</updated>
<author>
<name>James Bottomley</name>
<email>James.Bottomley@suse.de</email>
</author>
<published>2011-05-18T14:20:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a58e077eb600d1efd7e54ad9926a75a39d7f8ae'/>
<id>urn:sha1:0a58e077eb600d1efd7e54ad9926a75a39d7f8ae</id>
<content type='text'>
blk_cleanup_queue() calls elevator_exit() and after this, we can't
touch the elevator without oopsing.  __elv_next_request() must check
for this state because in the refcounted queue model, we can still
call it after blk_cleanup_queue() has been called.

This was reported as causing an oops attributable to scsi.

Signed-off-by: James Bottomley &lt;James.Bottomley@suse.de&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: hold queue if flush is running for non-queueable flush drive</title>
<updated>2011-05-06T17:36:25Z</updated>
<author>
<name>shaohua.li@intel.com</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2011-05-06T17:34:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ac0cc4508709d42ec9aa351086c7d38bfc0660c'/>
<id>urn:sha1:3ac0cc4508709d42ec9aa351086c7d38bfc0660c</id>
<content type='text'>
In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it.  Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue.  Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

    block: make the flush insertion use the tail of the dispatch list

    It's not a preempt type request, in fact we have to insert it
    behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
</feed>
