<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/ceph/osd_client.h, branch v3.10.65</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.65</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.65'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-03-31T16:58:12Z</updated>
<entry>
<title>libceph: block I/O when PAUSE or FULL osd map flags are set</title>
<updated>2014-03-31T16:58:12Z</updated>
<author>
<name>Josh Durgin</name>
<email>josh.durgin@inktank.com</email>
</author>
<published>2013-12-03T03:11:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4892ed8deb6989cd9c831156920bae490d6ad4d1'/>
<id>urn:sha1:4892ed8deb6989cd9c831156920bae490d6ad4d1</id>
<content type='text'>
commit d29adb34a94715174c88ca93e8aba955850c9bde upstream.

The PAUSEWR and PAUSERD flags are meant to stop the cluster from
processing writes and reads, respectively. The FULL flag is set when
the cluster determines that it is out of space, and will no longer
process writes.  PAUSEWR and PAUSERD are purely client-side settings
already implemented in userspace clients. The osd does nothing special
with these flags.

When the FULL flag is set, however, the osd responds to all writes
with -ENOSPC. For cephfs, this makes sense, but for rbd the block
layer translates this into EIO.  If a cluster goes from full to
non-full quickly, a filesystem on top of rbd will not behave well,
since some writes succeed while others get EIO.

Fix this by blocking any writes when the FULL flag is set in the osd
client. This is the same strategy used by userspace, so apply it by
default.  A follow-on patch makes this configurable.

__map_request() is called to re-target osd requests in case the
available osds changed.  Add a paused field to a ceph_osd_request, and
set it whenever an appropriate osd map flag is set.  Avoid queueing
paused requests in __map_request(), but force them to be resent if
they become unpaused.

Also subscribe to the next osd map from the monitor if any of these
flags are set, so paused requests can be unblocked as soon as
possible.

Fixes: http://tracker.ceph.com/issues/6079

Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: add function to ensure notifies are complete</title>
<updated>2014-01-09T20:24:26Z</updated>
<author>
<name>Josh Durgin</name>
<email>josh.durgin@inktank.com</email>
</author>
<published>2013-08-29T04:43:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2e5951b11b406a83f84c1eb3b5d722491f4d883'/>
<id>urn:sha1:a2e5951b11b406a83f84c1eb3b5d722491f4d883</id>
<content type='text'>
commit dd935f44a40f8fb02aff2cc0df2269c92422df1c upstream.

Without a way to flush the osd client's notify workqueue, a watch
event that is unregistered could continue receiving callbacks
indefinitely.

Unregistering the event simply means no new notifies are added to the
queue, but there may still be events in the queue that will call the
watch callback for the event. If the queue is flushed after the event
is unregistered, the caller can be sure no more watch callbacks will
occur for the canceled watch.

Signed-off-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: fix safe completion</title>
<updated>2014-01-09T20:24:25Z</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-05-31T07:54:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aede2cb5c95588e703e358239a4f3842e21f103e'/>
<id>urn:sha1:aede2cb5c95588e703e358239a4f3842e21f103e</id>
<content type='text'>
commit eb845ff13a44477f8a411baedbf11d678b9daf0a upstream.

handle_reply() calls complete_request() only if the first OSD reply
has ONDISK flag.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: use slab cache for osd client requests</title>
<updated>2013-05-02T16:58:41Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-05-01T17:43:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5522ae0b68421e2645303ff010e27afc5292e0ab'/>
<id>urn:sha1:5522ae0b68421e2645303ff010e27afc5292e0ab</id>
<content type='text'>
Create a slab cache to manage allocation of ceph_osdc_request
structures.

This resolves:
    http://tracker.ceph.com/issues/3926

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: support pages for class request data</title>
<updated>2013-05-02T04:19:06Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-19T20:34:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6c57b5545d46e276381a15a59283c984cf3f94e3'/>
<id>urn:sha1:6c57b5545d46e276381a15a59283c984cf3f94e3</id>
<content type='text'>
Add the ability to provide an array of pages as outbound request
data for object class method calls.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: support raw data requests</title>
<updated>2013-05-02T04:19:00Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-02-11T18:33:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49719778bfa5371ec9b5a7d989bb29000e3ac5df'/>
<id>urn:sha1:49719778bfa5371ec9b5a7d989bb29000e3ac5df</id>
<content type='text'>
Allow osd request ops that aren't otherwise structured (not class,
extent, or watch ops) to specify "raw" data to be used to hold
incoming data for the op.  Make use of this capability for the osd
STAT op.

Prefix the name of the private function osd_req_op_init() with "_",
and expose a new function by that (earlier) name whose purpose is to
initialize osd ops with (only) implied data.

For now we'll just support the use of a page array for an osd op
with incoming raw data.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: kill off osd data write_request parameters</title>
<updated>2013-05-02T04:18:58Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-15T19:50:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=406e2c9f9286fc93ae2191a7abf477dea05aadc9'/>
<id>urn:sha1:406e2c9f9286fc93ae2191a7abf477dea05aadc9</id>
<content type='text'>
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: change how "safe" callback is used</title>
<updated>2013-05-02T04:18:52Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-15T16:20:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=26be88087ae8a04a5b576aa2f490597b649fc132'/>
<id>urn:sha1:26be88087ae8a04a5b576aa2f490597b649fc132</id>
<content type='text'>
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: make method call data be a separate data item</title>
<updated>2013-05-02T04:18:35Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-05T19:46:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=04017e29bbcf0673d8a6af616c56e395d05f5971'/>
<id>urn:sha1:04017e29bbcf0673d8a6af616c56e395d05f5971</id>
<content type='text'>
Right now the data for a method call is specified via a pointer and
length, and it's copied--along with the class and method name--into
a pagelist data item to be sent to the osd.  Instead, encode the
data in a data item separate from the class and method names.

This will allow large amounts of data to be supplied to methods
without copying.  Only rbd uses the class functionality right now,
and when it really needs this it will probably need to use a page
array rather than a page list.  But this simple implementation
demonstrates the functionality on the osd client, and that's enough
for now.

This resolves:
    http://tracker.ceph.com/issues/4104

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: kill off osd request r_data_in and r_data_out</title>
<updated>2013-05-02T04:18:25Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-05T06:27:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5476492fba9fd0b4118aacf5b924dd29b8cca56c'/>
<id>urn:sha1:5476492fba9fd0b4118aacf5b924dd29b8cca56c</id>
<content type='text'>
Finally!  Convert the osd op data pointers into real structures, and
make the switch over to using them instead of having all ops share
the in and/or out data structures in the osd request.

Set up a new function to traverse the set of ops and release any
data associated with them (pages).

This and the patches leading up to it resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
</feed>
