<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/blkdev.h, branch v3.17</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.17</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.17'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-07-01T16:34:38Z</updated>
<entry>
<title>blk-mq: use percpu_ref for mq usage count</title>
<updated>2014-07-01T16:34:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-07-01T16:34:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=add703fda981b9719d37f371498b9f129acbd997'/>
<id>urn:sha1:add703fda981b9719d37f371498b9f129acbd997</id>
<content type='text'>
Currently, blk-mq uses a percpu_counter to keep track of how many
usages are in flight.  The percpu_counter is drained while freezing to
ensure that no usage is left in-flight after freezing is complete.
blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
per-cpu gating mechanism.

This type of code has relatively high chance of subtle bugs which are
extremely difficult to trigger and it's way too hairy to be open coded
in blk-mq.  percpu_ref can serve the same purpose after the recent
changes.  This patch replaces the open-coded per-cpu usage counting
and draining mechanism with percpu_ref.

blk_mq_queue_enter() performs tryget_live on the ref and exit()
performs put.  blk_mq_freeze_queue() kills the ref and waits until the
reference count reaches zero.  blk_mq_unfreeze_queue() revives the ref
and wakes up the waiters.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Nicholas A. Bellinger &lt;nab@linux-iscsi.org&gt;
Cc: Kent Overstreet &lt;kmo@daterainc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>blk-mq: decouble blk-mq freezing from generic bypassing</title>
<updated>2014-07-01T16:31:13Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2014-07-01T16:31:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=780db2071ac4d167ee4154ad9c96088f1bba044b'/>
<id>urn:sha1:780db2071ac4d167ee4154ad9c96088f1bba044b</id>
<content type='text'>
blk_mq freezing is entangled with generic bypassing which bypasses
blkcg and io scheduler and lets IO requests fall through the block
layer to the drivers in FIFO order.  This allows forward progress on
IOs with the advanced features disabled so that those features can be
configured or altered without worrying about stalling IO which may
lead to deadlock through memory allocation.

However, generic bypassing doesn't quite fit blk-mq.  blk-mq currently
doesn't make use of blkcg or ioscheds and it maps bypssing to
freezing, which blocks request processing and drains all the in-flight
ones.  This causes problems as bypassing assumes that request
processing is online.  blk-mq works around this by conditionally
allowing request processing for the problem case - during queue
initialization.

Another weirdity is that except for during queue cleanup, bypassing
started on the generic side prevents blk-mq from processing new
requests but doesn't drain the in-flight ones.  This shouldn't break
anything but again highlights that something isn't quite right here.

The root cause is conflating blk-mq freezing and generic bypassing
which are two different mechanisms.  The only intersecting purpose
that they serve is during queue cleanup.  Let's properly separate
blk-mq freezing from generic bypassing and simply use it where
necessary.

* request_queue-&gt;mq_freeze_depth is added and
  blk_mq_[un]freeze_queue() now operate on this counter instead of
  -&gt;bypass_depth.  The replacement for QUEUE_FLAG_BYPASS isn't added
  but the counter is tested directly.  This will be further updated by
  later changes.

* blk_mq_drain_queue() is dropped and "__" prefix is dropped from
  blk_mq_freeze_queue().  Queue cleanup path now calls
  blk_mq_freeze_queue() directly.

* blk_queue_enter()'s fast path condition is simplified to simply
  check @q-&gt;mq_freeze_depth.  Previously, the condition was

	!blk_queue_dying(q) &amp;&amp;
	    (!blk_queue_bypass(q) || !blk_queue_init_done(q))

  mq_freeze_depth is incremented right after dying is set and
  blk_queue_init_done() exception isn't necessary as blk-mq doesn't
  start frozen, which only leaves the blk_queue_bypass() test which
  can be replaced by @q-&gt;mq_freeze_depth test.

This change simplifies the code and reduces confusion in the area.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Nicholas A. Bellinger &lt;nab@linux-iscsi.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: add support for limiting gaps in SG lists</title>
<updated>2014-06-24T22:22:24Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-06-24T22:22:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=66cb45aa41315d1d9972cada354fbdf7870d7714'/>
<id>urn:sha1:66cb45aa41315d1d9972cada354fbdf7870d7714</id>
<content type='text'>
Another restriction inherited for NVMe - those devices don't support
SG lists that have "gaps" in them. Gaps refers to cases where the
previous SG entry doesn't end on a page boundary. For NVMe, all SG
entries must start at offset 0 (except the first) and end on a page
boundary (except the last).

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: blk_max_size_offset() should check -&gt;max_sectors</title>
<updated>2014-06-18T05:12:02Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-06-18T05:09:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=736ed4de766d4f0e8e6142dd4f9d73ef61835ed9'/>
<id>urn:sha1:736ed4de766d4f0e8e6142dd4f9d73ef61835ed9</id>
<content type='text'>
Commit 762380ad9322 inadvertently changed a check for max_sectors
to max_hw_sectors. Revert that part, so we still compare against
max_sectors.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: add blk_rq_set_block_pc()</title>
<updated>2014-06-06T13:57:37Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-06-06T13:57:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f27b087b81b70513b8c61ec20596c868f7b93474'/>
<id>urn:sha1:f27b087b81b70513b8c61ec20596c868f7b93474</id>
<content type='text'>
With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.

Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq-&gt;cmd_type directly.

Includes fixes from Christoph Hellwig &lt;hch@lst.de&gt; for my half-assed
attempt.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: add notion of a chunk size for request merging</title>
<updated>2014-06-05T19:38:39Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-06-05T19:38:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=762380ad9322951cea4ce9d24864265f9c66a916'/>
<id>urn:sha1:762380ad9322951cea4ce9d24864265f9c66a916</id>
<content type='text'>
Some drivers have different limits on what size a request should
optimally be, depending on the offset of the request. Similar to
dividing a device into chunks. Add a setting that allows the driver
to inform the block layer of such a chunk size. The block layer will
then prevent merging across the chunks.

This is needed to optimally support NVMe with a non-zero stripe size.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patchbomb from Andrew) into next</title>
<updated>2014-06-04T23:55:13Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-06-04T23:55:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00170fdd0846df7cdb5ad421d3a340440f930b8f'/>
<id>urn:sha1:00170fdd0846df7cdb5ad421d3a340440f930b8f</id>
<content type='text'>
Merge misc updates from Andrew Morton:

 - a few fixes for 3.16.  Cc'ed to stable so they'll get there somehow.

 - various misc fixes and cleanups

 - most of the ocfs2 queue.  Review is slow...

 - most of MM.  The MM queue is pretty huge this time, but not much in
   the way of feature work.

 - some tweaks under kernel/

 - printk maintenance work

 - updates to lib/

 - checkpatch updates

 - tweaks to init/

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (276 commits)
  fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
  fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
  init/main.c: remove an ifdef
  kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
  init/main.c: add initcall_blacklist kernel parameter
  init/main.c: don't use pr_debug()
  fs/binfmt_flat.c: make old_reloc() static
  fs/binfmt_elf.c: fix bool assignements
  fs/efs: convert printk(KERN_DEBUG to pr_debug
  fs/efs: add pr_fmt / use __func__
  fs/efs: convert printk to pr_foo()
  scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
  checkpatch: check stable email address
  checkpatch: warn on unnecessary void function return statements
  checkpatch: prefer kstrto&lt;foo&gt; to sscanf(buf, "%&lt;lhuidx&gt;", &amp;bar);
  checkpatch: add warning for kmalloc/kzalloc with multiply
  checkpatch: warn on #defines ending in semicolon
  checkpatch: make --strict a default for files in drivers/net and net/
  checkpatch: always warn on missing blank line after variable declaration block
  checkpatch: fix wildcard DT compatible string checking
  ...
</content>
</entry>
<entry>
<title>fs/block_dev.c: add bdev_read_page() and bdev_write_page()</title>
<updated>2014-06-04T23:54:02Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>matthew.r.wilcox@intel.com</email>
</author>
<published>2014-06-04T23:07:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=47a191fd38ebddb1bd1510ec2bc1085c578c8868'/>
<id>urn:sha1:47a191fd38ebddb1bd1510ec2bc1085c578c8868</id>
<content type='text'>
A block device driver may choose to provide a rw_page operation.  These
will be called when the filesystem is attempting to do page sized I/O to
page cache pages (ie not for direct I/O).  This does preclude I/Os that
are larger than page size, so this may only be a performance gain for
some devices.

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Tested-by: Dheeraj Reddy &lt;dheeraj.reddy@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/libfs.c: add generic data flush to fsync</title>
<updated>2014-06-04T23:53:55Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-06-04T23:06:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ac13a829f6adb674015ab399594c089990104af7'/>
<id>urn:sha1:ac13a829f6adb674015ab399594c089990104af7</id>
<content type='text'>
Description by Jan Kara:
 "A lot of older filesystems don't properly flush volatile disk caches
  on fsync(2) which can lead to loss of fsynced data after power failure.

This patch makes generic_file_fsync() issue proper cache flush to fix the
problem.  Sysadmin can use /sys/devices/.../cache_type to tell the system
it should not send the cache flush."

[akpm@linux-foundation.org: nuke ifdef]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix sparse warning on missed __percpu annotation</title>
<updated>2014-06-04T03:04:39Z</updated>
<author>
<name>Ming Lei</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2014-06-03T03:24:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6cdb0929fe6726ba5203fc5529b74564d98a9e9'/>
<id>urn:sha1:e6cdb0929fe6726ba5203fc5529b74564d98a9e9</id>
<content type='text'>
'struct blk_mq_ctx' is  __percpu, so add the annotation
and fix the sparse warning reported from Fengguang:

	[block:for-linus 2/3] block/blk-mq.h:75:16: sparse: incorrect
	type in initializer (different address spaces)

Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
