<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block/elevator.c, branch v3.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-11-09T07:44:27Z</updated>
<entry>
<title>block: recursive merge requests</title>
<updated>2012-11-09T07:44:27Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fusionio.com</email>
</author>
<published>2012-11-09T07:44:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bee0393cc12b6d8f10e884e555a095e050e0b2b9'/>
<id>urn:sha1:bee0393cc12b6d8f10e884e555a095e050e0b2b9</id>
<content type='text'>
In a workload, thread 1 accesses a, a+2, ..., thread 2 accesses a+1, a+3,....
When the requests are flushed to queue, a and a+1 are merged to (a, a+1), a+2
and a+3 too to (a+2, a+3), but (a, a+1) and (a+2, a+3) aren't merged.

If we do recursive merge for such interleave access, some workloads throughput
get improvement. A recent worload I'm checking on is swap, below change
boostes the throughput around 5% ~ 10%.

Signed-off-by: Shaohua Li &lt;shli@fusionio.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Clean up special command handling logic</title>
<updated>2012-09-20T12:31:38Z</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2012-09-18T16:19:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e2a60da74fc8215c68509a89e9a69c66363153db'/>
<id>urn:sha1:e2a60da74fc8215c68509a89e9a69c66363153db</id>
<content type='text'>
Remove special-casing of non-rw fs style requests (discard). The nomerge
flags are consolidated in blk_types.h, and rq_mergeable() and
bio_mergeable() have been modified to use them.

bio_is_rw() is used in place of bio_has_data() a few places. This is
done to to distinguish true reads and writes from other fs type requests
that carry a payload (e.g. write same).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: implement per-queue policy activation</title>
<updated>2012-04-20T08:06:06Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-04-13T20:11:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2b1693bac45ea3fe3ba612fd22c45f17449f610'/>
<id>urn:sha1:a2b1693bac45ea3fe3ba612fd22c45f17449f610</id>
<content type='text'>
All blkcg policies were assumed to be enabled on all request_queues.
Due to various implementation obstacles, during the recent blkcg core
updates, this was temporarily implemented as shooting down all !root
blkgs on elevator switch and policy [de]registration combined with
half-broken in-place root blkg updates.  In addition to being buggy
and racy, this meant losing all blkcg configurations across those
events.

Now that blkcg is cleaned up enough, this patch replaces the temporary
implementation with proper per-queue policy activation.  Each blkcg
policy should call the new blkcg_[de]activate_policy() to enable and
disable the policy on a specific queue.  blkcg_activate_policy()
allocates and installs policy data for the policy for all existing
blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
not enabled for a given queue, blkg printing / config functions skip
the respective blkg for the queue.

blkcg_activate_policy() also takes care of root blkg creation, and
cfq_init_queue() and blk_throtl_init() are updated accordingly.

This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
unnecessary.  Dropped.

v2: cfq_init_queue() was returning uninitialized @ret on root_group
    alloc failure if !CONFIG_CFQ_GROUP_IOSCHED.  Fixed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: implement bio_associate_current()</title>
<updated>2012-03-06T20:27:24Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=852c788f8365062c8a383c5a93f7f7289977cb50'/>
<id>urn:sha1:852c788f8365062c8a383c5a93f7f7289977cb50</id>
<content type='text'>
IO scheduling and cgroup are tied to the issuing task via io_context
and cgroup of %current.  Unfortunately, there are cases where IOs need
to be routed via a different task which makes scheduling and cgroup
limit enforcement applied completely incorrectly.

For example, all bios delayed by blk-throttle end up being issued by a
delayed work item and get assigned the io_context of the worker task
which happens to serve the work item and dumped to the default block
cgroup.  This is double confusing as bios which aren't delayed end up
in the correct cgroup and makes using blk-throttle and cfq propio
together impossible.

Any code which punts IO issuing to another task is affected which is
getting more and more common (e.g. btrfs).  As both io_context and
cgroup are firmly tied to task including userland visible APIs to
manipulate them, it makes a lot of sense to match up tasks to bios.

This patch implements bio_associate_current() which associates the
specified bio with %current.  The bio will record the associated ioc
and blkcg at that point and block layer will use the recorded ones
regardless of which task actually ends up issuing the bio.  bio
release puts the associated ioc and blkcg.

It grabs and remembers ioc and blkcg instead of the task itself
because task may already be dead by the time the bio is issued making
ioc and blkcg inaccessible and those are all block layer cares about.

elevator_set_req_fn() is updated such that the bio elvdata is being
allocated for is available to the elevator.

This doesn't update block cgroup policies yet.  Further patches will
implement the support.

-v2: #ifdef CONFIG_BLK_CGROUP added around bio-&gt;bi_ioc dereference in
     rq_ioc() to fix build breakage.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Kent Overstreet &lt;koverstreet@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: unify blkg's for blkcg policies</title>
<updated>2012-03-06T20:27:23Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8989fae38d9831c72b20375a206a919ca468c52'/>
<id>urn:sha1:e8989fae38d9831c72b20375a206a919ca468c52</id>
<content type='text'>
Currently, blkg is per cgroup-queue-policy combination.  This is
unnatural and leads to various convolutions in partially used
duplicate fields in blkg, config / stat access, and general management
of blkgs.

This patch make blkg's per cgroup-queue and let them serve all
policies.  blkgs are now created and destroyed by blkcg core proper.
This will allow further consolidation of common management logic into
blkcg core and API with better defined semantics and layering.

As a transitional step to untangle blkg management, elvswitch and
policy [de]registration, all blkgs except the root blkg are being shot
down during elvswitch and bypass.  This patch adds blkg_root_update()
to update root blkg in place on policy change.  This is hacky and racy
but should be good enough as interim step until we get locking
simplified and switch over to proper in-place update for all blkgs.

-v2: Root blkgs need to be updated on elvswitch too and blkg_alloc()
     comment wasn't updated according to the function change.  Fixed.
     Both pointed out by Vivek.

-v3: v2 updated blkg_destroy_all() to invoke update_root_blkg_pd() for
     all policies.  This freed root pd during elvswitch before the
     last queue finished exiting and led to oops.  Directly invoke
     update_root_blkg_pd() only on BLKIO_POLICY_PROP from
     cfq_exit_queue().  This also is closer to what will be done with
     proper in-place blkg update.  Reported by Vivek.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: let blkcg core manage per-queue blkg list and counter</title>
<updated>2012-03-06T20:27:23Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=03aa264ac15637b6f98374270bcdf31400965505'/>
<id>urn:sha1:03aa264ac15637b6f98374270bcdf31400965505</id>
<content type='text'>
With the previous patch to move blkg list heads and counters to
request_queue and blkg, logic to manage them in both policies are
almost identical and can be moved to blkcg core.

This patch moves blkg link logic into blkg_lookup_create(), implements
common blkg unlink code in blkg_destroy(), and updates
blkg_destory_all() so that it's policy specific and can skip root
group.  The updated blkg_destroy_all() is now used to both clear queue
for bypassing and elv switching, and release all blkgs on q exit.

This patch introduces a race window where policy [de]registration may
race against queue blkg clearing.  This can only be a problem on cfq
unload and shouldn't be a real problem in practice (and we have many
other places where this race already exists).  Future patches will
remove these unlikely races.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: shoot down blkio_groups on elevator switch</title>
<updated>2012-03-06T20:27:22Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=72e06c255181537d0b3e1f657a9ed81655d745b1'/>
<id>urn:sha1:72e06c255181537d0b3e1f657a9ed81655d745b1</id>
<content type='text'>
Elevator switch may involve changes to blkcg policies.  Implement
shoot down of blkio_groups.

Combined with the previous bypass updates, the end goal is updating
blkcg core such that it can ensure that blkcg's being affected become
quiescent and don't have any per-blkg data hanging around before
commencing any policy updates.  Until queues are made aware of the
policies that applies to them, as an interim step, all per-policy blkg
data will be shot down.

* blk-throtl doesn't need this change as it can't be disabled for a
  live queue; however, update it anyway as the scheduled blkg
  unification requires this behavior change.  This means that
  blk-throtl configuration will be unnecessarily lost over elevator
  switch.  This oddity will be removed after blkcg learns to associate
  individual policies with request_queues.

* blk-throtl dosen't shoot down root_tg.  This is to ease transition.
  Unified blkg will always have persistent root group and not shooting
  down root_tg for now eases transition to that point by avoiding
  having to update td-&gt;root_tg and is safe as blk-throtl can never be
  disabled

-v2: Vivek pointed out that group list is not guaranteed to be empty
     on return from clear function if it raced cgroup removal and
     lost.  Fix it by waiting a bit and retrying.  This kludge will
     soon be removed once locking is updated such that blkg is never
     in limbo state between blkcg and request_queue locks.

     blk-throtl no longer shoots down root_tg to avoid breaking
     td-&gt;root_tg.

     Also, Nest queue_lock inside blkio_list_lock not the other way
     around to avoid introduce possible deadlock via blkcg lock.

-v3: blkcg_clear_queue() repositioned and renamed to
     blkg_destroy_all() to increase consistency with later changes.
     cfq_clear_queue() updated to check q-&gt;elevator before
     dereferencing it to avoid NULL dereference on not fully
     initialized queues (used by later change).

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: implement blk_queue_bypass_start/end()</title>
<updated>2012-03-06T20:27:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:14:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d732580b4eb31553c63744a47d590f770cafb8f0'/>
<id>urn:sha1:d732580b4eb31553c63744a47d590f770cafb8f0</id>
<content type='text'>
Rename and extend elv_queisce_start/end() to
blk_queue_bypass_start/end() which are exported and supports nesting
via @q-&gt;bypass_depth.  Also add blk_queue_bypass() to test bypass
state.

This will be further extended and used for blkio_group management.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>elevator: make elevator_init_fn() return 0/-errno</title>
<updated>2012-03-06T20:27:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:14:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2fab5acd28ead6f0dd6c3996ba23f0ef1772f15'/>
<id>urn:sha1:b2fab5acd28ead6f0dd6c3996ba23f0ef1772f15</id>
<content type='text'>
elevator_ops-&gt;elevator_init_fn() has a weird return value.  It returns
a void * which the caller should assign to q-&gt;elevator-&gt;elevator_data
and %NULL return denotes init failure.

Update such that it returns integer 0/-errno and sets elevator_data
directly as necessary.

This makes the interface more conventional and eases further cleanup.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>elevator: clear auxiliary data earlier during elevator switch</title>
<updated>2012-03-06T20:27:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:14:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5a5bafdc396b1da7570f84fb96a0f8a288970c5e'/>
<id>urn:sha1:5a5bafdc396b1da7570f84fb96a0f8a288970c5e</id>
<content type='text'>
Elevator switch tries hard to keep as much as context until new
elevator is ready so that it can revert to the original state if
initializing the new elevator fails for some reason.  Unfortunately,
with more auxiliary contexts to manage, this makes elevator init and
exit paths too complex and fragile.

This patch makes elevator_switch() unregister the current elevator and
flush icq's before start initializing the new one.  As we still keep
the old elevator itself, the only difference is that we lose icq's on
rare occassions of switching failure, which isn't critical at all.

Note that this makes explicit elevator parameter to
elevator_init_queue() and __elv_register_queue() unnecessary as they
always can use the current elevator.

This patch enables block cgroup cleanups.

-v2: blk_add_trace_msg() prints elevator name from @new_e instead of
     @e-&gt;type as the local variable no longer exists.  This caused
     build failure on CONFIG_BLK_DEV_IO_TRACE.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
