<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block/blk-sysfs.c, branch v3.13.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.13.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.13.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-11-15T00:32:22Z</updated>
<entry>
<title>kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS</title>
<updated>2013-11-15T00:32:22Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2013-11-14T22:32:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a06ff068f1255bcd7965ab07bc0f4adc3eb639a'/>
<id>urn:sha1:0a06ff068f1255bcd7965ab07bc0f4adc3eb639a</id>
<content type='text'>
We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: new multi-queue block IO queueing mechanism</title>
<updated>2013-10-25T10:56:00Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2013-10-24T08:20:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=320ae51feed5c2f13664aa05a76bec198967e04d'/>
<id>urn:sha1:320ae51feed5c2f13664aa05a76bec198967e04d</id>
<content type='text'>
Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
  request units for IO. The block layer provides various helper
  functionalities to let drivers share code, things like tag
  management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
  block layer and IO submitter. Since this bypasses the IO stack,
  driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
  be able to uniquely identify a request both in the driver and
  to the hardware. The tagging uses per-cpu caches for freed
  tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
  basis. Basically the driver should be able to get a notification,
  if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
  submission queues. blk-mq can redirect IO completions to the
  desired location.

- Support for per-request payloads. Drivers almost always need
  to associate a request structure with some driver private
  command structure. Drivers can tell blk-mq this at init time,
  and then any request handed to the driver will have the
  required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
  gets neither of these. Even for high IOPS devices, merging
  sequential IO reduces per-command overhead and thus
  increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li &lt;shli@fusionio.com&gt;
Alexander Gordeev &lt;agordeev@redhat.com&gt;
Christoph Hellwig &lt;hch@infradead.org&gt;
Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Matias Bjorling &lt;m@bjorling.me&gt;
Jeff Moyer &lt;jmoyer@redhat.com&gt;

Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/blk-sysfs.c: replace strict_strtoul() with kstrtoul()</title>
<updated>2013-09-11T22:56:56Z</updated>
<author>
<name>Jingoo Han</name>
<email>jg1.han@samsung.com</email>
</author>
<published>2013-09-11T21:20:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed751e683c563be64322b9bfa0f0f7e5da9bd37c'/>
<id>urn:sha1:ed751e683c563be64322b9bfa0f0f7e5da9bd37c</id>
<content type='text'>
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete.  Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han &lt;jg1.han@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: avoid using uninitialized value in from queue_var_store</title>
<updated>2013-04-03T19:53:57Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2013-04-03T19:53:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c678ef5286ddb5cf70384ad5af286b0afc9b73e1'/>
<id>urn:sha1:c678ef5286ddb5cf70384ad5af286b0afc9b73e1</id>
<content type='text'>
As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
that use a value generated by queue_var_store independent of whether
that value was set or not.

block/blk-sysfs.c: In function 'queue_store_nonrot':
block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]

Unlike most other such warnings, this one is not a false positive,
writing any non-number string into the sysfs files indeed has
an undefined result, rather than returning an error.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: RCU free request_queue</title>
<updated>2013-01-09T16:05:13Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-09T16:05:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=548bc8e1b38e48653a90f48f636f8d253504f8a2'/>
<id>urn:sha1:548bc8e1b38e48653a90f48f636f8d253504f8a2</id>
<content type='text'>
RCU free request_queue so that blkcg_gq-&gt;q can be dereferenced under
RCU lock.  This will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
</content>
</entry>
<entry>
<title>block: Rename queue dead flag</title>
<updated>2012-12-06T13:30:58Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2012-11-28T12:42:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3f3299d5c0268d6cc3f47b446e8aca436e4a5651'/>
<id>urn:sha1:3f3299d5c0268d6cc3f47b446e8aca436e4a5651</id>
<content type='text'>
QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
stop. After this flag has been set queue draining starts. However,
during the queue draining phase it is still safe to invoke the
queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
flag.

This patch has been generated by running the following command
over the kernel source tree:

git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
    xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
        -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
    include/linux/blkdev.h;                                       \
sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
    -e 's/Dead queue/A dying queue/' block/blk-core.c

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: James Bottomley &lt;JBottomley@Parallels.com&gt;
Cc: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Chanho Min &lt;chanho.min@lge.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()</title>
<updated>2012-09-21T13:32:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-09-20T21:08:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=749fefe6778e98dfefe3b8bb72a93875196ec554'/>
<id>urn:sha1:749fefe6778e98dfefe3b8bb72a93875196ec554</id>
<content type='text'>
b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
request_queues bypassed on allocation to avoid switching on and off
bypass mode on a queue being initialized.  Some drivers allocate and
then destroy a lot of queues without fully initializing them and
incurring bypass latency overhead on each of them could add upto
significant overhead.

Unfortunately, blk_init_allocated_queue() is never used by queues of
bio-based drivers, which means that all bio-based driver queues are in
bypass mode even after initialization and registration complete
successfully.

Due to the limited way request_queues are used by bio drivers, this
problem is hidden pretty well but it shows up when blk-throttle is
used in combination with a bio-based driver.  Trying to configure
(echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
indefinitely in blkg_conf_prep() waiting for bypass mode to end.

This patch moves the initial blk_queue_bypass_end() call from
blk_init_allocated_queue() to blk_register_queue() which is called for
any userland-visible queues regardless of its type.

I believe this is correct because I don't think there is any block
driver which needs or wants working elevator and blk-cgroup on a queue
which isn't visible to userland.  If there are such users, we need a
different solution.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Joseph Glanville &lt;joseph.glanville@orionvm.com.au&gt;
Cc: stable@vger.kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Implement support for WRITE SAME</title>
<updated>2012-09-20T12:31:45Z</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2012-09-18T16:19:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4363ac7c13a9a4b763c6e8d9fdbfc2468f3b8ca4'/>
<id>urn:sha1:4363ac7c13a9a4b763c6e8d9fdbfc2468f3b8ca4</id>
<content type='text'>
The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: reject invalid queue attribute values</title>
<updated>2012-09-09T08:39:18Z</updated>
<author>
<name>Dave Reisner</name>
<email>dreisner@archlinux.org</email>
</author>
<published>2012-09-08T15:55:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b1f3b64d76cf88cc250e5cdd1de783ba9737078e'/>
<id>urn:sha1:b1f3b64d76cf88cc250e5cdd1de783ba9737078e</id>
<content type='text'>
Instead of using simple_strtoul which "converts" invalid numbers to 0,
use strict_strtoul and perform error checking to ensure that userspace
passes us a valid unsigned long. This addresses problems with functions
such as writev, which might want to write a trailing newline -- the
newline should rightfully be rejected, but the value preceeding it
should be preserved.

Fixes BZ#46981.

Signed-off-by: Dave Reisner &lt;dreisner@archlinux.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: implement per-blkg request allocation</title>
<updated>2012-06-26T22:42:49Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-06-26T22:05:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a051661ca6d134c18599498b185b667859d4339b'/>
<id>urn:sha1:a051661ca6d134c18599498b185b667859d4339b</id>
<content type='text'>
Currently, request_queue has one request_list to allocate requests
from regardless of blkcg of the IO being issued.  When the unified
request pool is used up, cfq proportional IO limits become meaningless
- whoever grabs the next request being freed wins the race regardless
of the configured weights.

This can be easily demonstrated by creating a blkio cgroup w/ very low
weight, put a program which can issue a lot of random direct IOs there
and running a sequential IO from a different cgroup.  As soon as the
request pool is used up, the sequential IO bandwidth crashes.

This patch implements per-blkg request_list.  Each blkg has its own
request_list and any IO allocates its request from the matching blkg
making blkcgs completely isolated in terms of request allocation.

* Root blkcg uses the request_list embedded in each request_queue,
  which was renamed to @q-&gt;root_rl from @q-&gt;rq.  While making blkcg rl
  handling a bit harier, this enables avoiding most overhead for root
  blkcg.

* Queue fullness is properly per request_list but bdi isn't blkcg
  aware yet, so congestion state currently just follows the root
  blkcg.  As writeback isn't aware of blkcg yet, this works okay for
  async congestion but readahead may get the wrong signals.  It's
  better than blkcg completely collapsing with shared request_list but
  needs to be improved with future changes.

* After this change, each block cgroup gets a full request pool making
  resource consumption of each cgroup higher.  This makes allowing
  non-root users to create cgroups less desirable; however, note that
  allowing non-root users to directly manage cgroups is already
  severely broken regardless of this patch - each block cgroup
  consumes kernel memory and skews IO weight (IO weights are not
  hierarchical).

v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure.  Fix it
    by falling back to root_rl on blkg lookup failures.  This problem
    was spotted by Rakesh Iyer &lt;rni@google.com&gt;.

v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue".  blk_drain_queue() now wakes up waiters on all
    blkg-&gt;rl on the target queue.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
