<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block/blk.h, branch ipvs/experimental</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=ipvs%2Fexperimental</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=ipvs%2Fexperimental'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2009-09-11T12:33:30Z</updated>
<entry>
<title>block: implement mixed merge of different failfast requests</title>
<updated>2009-09-11T12:33:30Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2009-07-03T08:48:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=80a761fd33cf812f771e212139157bf8f58d4b3f'/>
<id>urn:sha1:80a761fd33cf812f771e212139157bf8f58d4b3f</id>
<content type='text'>
Failfast has characteristics from other attributes.  When issuing,
executing and successuflly completing requests, failfast doesn't make
any difference.  It only affects how a request is handled on failure.
Allowing requests with different failfast settings to be merged cause
normal IOs to fail prematurely while not allowing has performance
penalties as failfast is used for read aheads which are likely to be
located near in-flight or to-be-issued normal IOs.

This patch introduces the concept of 'mixed merge'.  A request is a
mixed merge if it is merge of segments which require different
handling on failure.  Currently the only mixable attributes are
failfast ones (or lack thereof).

When a bio with different failfast settings is added to an existing
request or requests of different failfast settings are merged, the
merged request is marked mixed.  Each bio carries failfast settings
and the request always tracks failfast state of the first bio.  When
the request fails, blk_rq_err_bytes() can be used to determine how
many bytes can be safely failed without crossing into an area which
requires further retrials.

This allows request merging regardless of failfast settings while
keeping the failure handling correct.

This patch only implements mixed merge but doesn't enable it.  The
next one will update SCSI to make use of mixed merge.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Niel Lambrechts &lt;niel.lambrechts@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: fix no diskstat problem</title>
<updated>2009-05-27T12:50:02Z</updated>
<author>
<name>Kiyoshi Ueda</name>
<email>k-ueda@ct.jp.nec.com</email>
</author>
<published>2009-05-27T12:50:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c4198e874cde694f5ea1463706873e7907bdb18'/>
<id>urn:sha1:3c4198e874cde694f5ea1463706873e7907bdb18</id>
<content type='text'>
The commit below in 2.6-block/for-2.6.31 causes no diskstat problem
because the blk_discard_rq() check was added with '&amp;&amp;'.
It should be 'blk_fs_request() || blk_discard_rq()'.
This patch does it and fixes the no diskstat problem.
Please review and apply.

------ /proc/diskstat without this patch -------------------------------------
   8       0 sda 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------

----- /proc/diskstat with this patch applied ---------------------------------
   8       0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059
------------------------------------------------------------------------------

--------------------------------------------------------------------------
commit c69d48540c201394d08cb4d48b905e001313d9b8
Author: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Date:   Fri Apr 24 08:12:19 2009 +0200

    block: include discard requests in IO accounting

    We currently don't do merging on discard requests, but we potentially
    could. If we do, then we need to include discard requests in the IO
    accounting, or merging would end up decrementing in_flight IO counters
    for an IO which never incremented them.

    So enable accounting for discard requests.

&lt;snip&gt;

 static inline int blk_do_io_stat(struct request *rq)
 {
-       return rq-&gt;rq_disk &amp;&amp; blk_rq_io_stat(rq) &amp;&amp; blk_fs_request(rq);
+       return rq-&gt;rq_disk &amp;&amp; blk_rq_io_stat(rq) &amp;&amp; blk_fs_request(rq) &amp;&amp;
+               blk_discard_rq(rq);
 }
--------------------------------------------------------------------------

Signed-off-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: Un-export blk_rq_append_bio</title>
<updated>2009-05-19T10:14:56Z</updated>
<author>
<name>Boaz Harrosh</name>
<email>bharrosh@panasas.com</email>
</author>
<published>2009-05-17T16:00:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a411f4bbb89f1f08687b344064d6775bce1e4658'/>
<id>urn:sha1:a411f4bbb89f1f08687b344064d6775bce1e4658</id>
<content type='text'>
OSD was the last in-tree user of blk_rq_append_bio(). Now
that it is fixed blk_rq_append_bio is un-exported and
is only used internally by block layer.

Signed-off-by: Boaz Harrosh &lt;bharrosh@panasas.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: implement and enforce request peek/start/fetch</title>
<updated>2009-05-11T07:52:18Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2009-05-08T02:54:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9934c8c04561413609d2bc38c6b9f268cba774a4'/>
<id>urn:sha1:9934c8c04561413609d2bc38c6b9f268cba774a4</id>
<content type='text'>
Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request().  After that, drivers are free to either dequeue it
or process it without dequeueing.  Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary.  However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing.  Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model.  This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Mike Miller &lt;mike.miller@hp.com&gt;
Cc: unsik Kim &lt;donari75@gmail.com&gt;
Cc: Paul Clements &lt;paul.clements@steeleye.com&gt;
Cc: Tim Waugh &lt;tim@cyberelk.net&gt;
Cc: Geert Uytterhoeven &lt;Geert.Uytterhoeven@sonycom.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Laurent Vivier &lt;Laurent@lvivier.info&gt;
Cc: Jeff Garzik &lt;jgarzik@pobox.com&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@xensource.com&gt;
Cc: Grant Likely &lt;grant.likely@secretlab.ca&gt;
Cc: Adrian McMenamin &lt;adrian@mcmen.demon.co.uk&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;bzolnier@gmail.com&gt;
Cc: Borislav Petkov &lt;petkovbb@googlemail.com&gt;
Cc: Sergei Shtylyov &lt;sshtylyov@ru.mvista.com&gt;
Cc: Alex Dubov &lt;oakad@yahoo.com&gt;
Cc: Pierre Ossman &lt;drzeus@drzeus.cx&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Markus Lidel &lt;Markus.Lidel@shadowconnect.com&gt;
Cc: Stefan Weinhuber &lt;wein@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Pete Zaitcev &lt;zaitcev@redhat.com&gt;
Cc: FUJITA Tomonori &lt;fujita.tomonori@lab.ntt.co.jp&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: drop request-&gt;hard_* and *nr_sectors</title>
<updated>2009-05-11T07:50:54Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2009-05-07T13:24:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e46e8b27aa57c6bd34b3102b40ee4d0144b4fab'/>
<id>urn:sha1:2e46e8b27aa57c6bd34b3102b40ee4d0144b4fab</id>
<content type='text'>
struct request has had a few different ways to represent some
properties of a request.  -&gt;hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers.  The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion.  In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.

Another interesting duplicate fields are rq-&gt;[hard_]nr_sectors and
rq-&gt;{hard_cur|current}_nr_sectors against rq-&gt;data_len and
rq-&gt;bio-&gt;bi_size.  This is more convoluted than the hard_ case.

rq-&gt;[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests.  rq-&gt;data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests.  This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.

rq-&gt;{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front.  This is mainly used by drivers
which transfers data by walking request segment-by-segment.  This
value always equals rq-&gt;bio-&gt;bi_size &gt;&gt; 9.  However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.

In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs.  With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly.  Drop all the duplicates.  Now rq-&gt;sector
means the current sector, rq-&gt;data_len the current total length and
rq-&gt;bio-&gt;bi_size the current segment length.  Everything else is
defined in terms of these three and available only through accessors.

* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
  now handles pc and fs requests equally other than rq-&gt;sector update.
  This means that now pc requests can use partial completion too (no
  in-kernel user yet tho).

* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
  now uses byte count as the primary data length.

* blk_rq_pos() is now guranteed to be always correct.  In-block users
  converted.

* blk_rq_bytes() is now guaranteed to be always valid as is
  blk_rq_sectors().  In-block users converted.

* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() &gt;&gt; 9.
  More convenient one is used.

* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
  pointer to request.

[ Impact: API cleanup, single way to represent one property of a request ]

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Boaz Harrosh &lt;bharrosh@panasas.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: include discard requests in IO accounting</title>
<updated>2009-04-28T05:37:37Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-24T06:12:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c69d48540c201394d08cb4d48b905e001313d9b8'/>
<id>urn:sha1:c69d48540c201394d08cb4d48b905e001313d9b8</id>
<content type='text'>
We currently don't do merging on discard requests, but we potentially
could. If we do, then we need to include discard requests in the IO
accounting, or merging would end up decrementing in_flight IO counters
for an IO which never incremented them.

So enable accounting for discard requests.

Problem found by Nikanth Karthikesan &lt;knikanth@suse.de&gt;

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: make blk_do_io_stat() do the full "is this rq accountable" checks</title>
<updated>2009-04-28T05:37:37Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-24T06:10:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c2553b5844b06910435e40cfab9e6f384840cb97'/>
<id>urn:sha1:c2553b5844b06910435e40cfab9e6f384840cb97</id>
<content type='text'>
We currently check for file system requests outside of blk_do_io_stat(rq),
but we may as well just include it.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: reorganize request fetching functions</title>
<updated>2009-04-28T05:37:34Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2009-04-23T02:05:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=158dbda0068e63c7cce7bd47c123bd1dfa5a902c'/>
<id>urn:sha1:158dbda0068e63c7cce7bd47c123bd1dfa5a902c</id>
<content type='text'>
Impact: code reorganization

elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation.  They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.

Move the two functions to blk-core.c.  This prepares for further
interface cleanup.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: simplify I/O stat accounting</title>
<updated>2009-04-24T06:54:21Z</updated>
<author>
<name>Jerome Marchand</name>
<email>jmarchan@redhat.com</email>
</author>
<published>2009-04-22T12:01:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42dad7647aec49b3ad20dd0cb832b232a6ae514f'/>
<id>urn:sha1:42dad7647aec49b3ad20dd0cb832b232a6ae514f</id>
<content type='text'>
This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.

Signed-off-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: fix bad spelling of quiesce</title>
<updated>2009-04-15T06:28:09Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-08T12:22:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f600abe2de81628c40effbb3f8eaf5af0d291e57'/>
<id>urn:sha1:f600abe2de81628c40effbb3f8eaf5af0d291e57</id>
<content type='text'>
Credit goes to Andrew Morton for spotting this one.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
</feed>
