<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/ceph/osd_client.h, branch v3.16.1</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.1</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.1'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-04-03T02:33:52Z</updated>
<entry>
<title>libceph: bump CEPH_OSD_MAX_OP to 3</title>
<updated>2014-04-03T02:33:52Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-02-25T14:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7cc69d42e6950404587bef9489a5ed6f9f6bab4e'/>
<id>urn:sha1:7cc69d42e6950404587bef9489a5ed6f9f6bab4e</id>
<content type='text'>
Our longest osd request now contains 3 ops: copyup+hint+write.

Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was
hard-coded to 2.  Fix it, and switch to rbd_assert while at it.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
</content>
</entry>
<entry>
<title>libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op</title>
<updated>2014-04-03T02:33:51Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-02-25T14:22:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c647b8a8c6366f849c2a237bfe525cb1d316d5f4'/>
<id>urn:sha1:c647b8a8c6366f849c2a237bfe525cb1d316d5f4</id>
<content type='text'>
This is primarily for rbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
</content>
</entry>
<entry>
<title>libceph: encode CEPH_OSD_OP_FLAG_* op flags</title>
<updated>2014-04-03T02:33:51Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-02-25T14:22:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b25bf5f02c5c80adf96120e031dc3a1756ce54d'/>
<id>urn:sha1:7b25bf5f02c5c80adf96120e031dc3a1756ce54d</id>
<content type='text'>
Encode ceph_osd_op::flags field so that it gets sent over the wire.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
</content>
</entry>
<entry>
<title>libceph: follow redirect replies from osds</title>
<updated>2014-01-27T21:57:53Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=205ee1187a671c3b067d7f1e974903b44036f270'/>
<id>urn:sha1:205ee1187a671c3b067d7f1e974903b44036f270</id>
<content type='text'>
Follow redirect replies from osds, for details see ceph.git commit
fbbe3ad1220799b7bb00ea30fce581c5eadaf034.

v1 (current) version of redirect reply consists of oloc and oid, which
expands to pool, key, nspace, hash and oid.  However, server-side code
that would populate anything other than pool doesn't exist yet, and
hence this commit adds support for pool redirects only.  To make sure
that future server-side updates don't break us, we decode all fields
and, if any of key, nspace, hash or oid have a non-default value, error
out with "corrupt osd_op_reply ..." message.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}</title>
<updated>2014-01-27T21:57:49Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c972c95c68f455d80ff185aa440857be046bbe0'/>
<id>urn:sha1:3c972c95c68f455d80ff185aa440857be046bbe0</id>
<content type='text'>
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before
introducing r_target_{oloc,oid} needed for redirects.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: introduce and start using oid abstraction</title>
<updated>2014-01-27T21:57:28Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4295f2217a5aa8ef2738e3a368db3c1ceab41212'/>
<id>urn:sha1:4295f2217a5aa8ef2738e3a368db3c1ceab41212</id>
<content type='text'>
In preparation for tiering support, which would require having two
(base and target) object names for each osd request and also copying
those names around, introduce struct ceph_object_id (oid) and a couple
helpers to facilitate those copies and encapsulate the fact that object
name is not necessarily a NUL-terminated string.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN</title>
<updated>2014-01-27T21:57:24Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2d0ebc5d591f49131bf8f93b54c5424162c3fb7f'/>
<id>urn:sha1:2d0ebc5d591f49131bf8f93b54c5424162c3fb7f</id>
<content type='text'>
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to
CEPH_MAX_OID_NAME_LEN.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: start using oloc abstraction</title>
<updated>2014-01-27T21:57:03Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=22116525baec1d63f4878eaa92f0b57946a78819'/>
<id>urn:sha1:22116525baec1d63f4878eaa92f0b57946a78819</id>
<content type='text'>
Instead of relying on pool fields in ceph_file_layout (for mapping) and
ceph_pg (for enconding), start using ceph_object_locator (oloc)
abstraction.  Note that userspace oloc currently consists of pool, key,
nspace and hash fields, while this one contains only a pool.  This is
OK, because at this point we only send (i.e. encode) olocs and never
have to receive (i.e. decode) them.

This makes keeping a copy of ceph_file_layout in every osd request
unnecessary, so ceph_osd_request::r_file_layout field is nuked.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: block I/O when PAUSE or FULL osd map flags are set</title>
<updated>2013-12-13T17:13:29Z</updated>
<author>
<name>Josh Durgin</name>
<email>josh.durgin@inktank.com</email>
</author>
<published>2013-12-03T03:11:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde'/>
<id>urn:sha1:d29adb34a94715174c88ca93e8aba955850c9bde</id>
<content type='text'>
The PAUSEWR and PAUSERD flags are meant to stop the cluster from
processing writes and reads, respectively. The FULL flag is set when
the cluster determines that it is out of space, and will no longer
process writes.  PAUSEWR and PAUSERD are purely client-side settings
already implemented in userspace clients. The osd does nothing special
with these flags.

When the FULL flag is set, however, the osd responds to all writes
with -ENOSPC. For cephfs, this makes sense, but for rbd the block
layer translates this into EIO.  If a cluster goes from full to
non-full quickly, a filesystem on top of rbd will not behave well,
since some writes succeed while others get EIO.

Fix this by blocking any writes when the FULL flag is set in the osd
client. This is the same strategy used by userspace, so apply it by
default.  A follow-on patch makes this configurable.

__map_request() is called to re-target osd requests in case the
available osds changed.  Add a paused field to a ceph_osd_request, and
set it whenever an appropriate osd map flag is set.  Avoid queueing
paused requests in __map_request(), but force them to be resent if
they become unpaused.

Also subscribe to the next osd map from the monitor if any of these
flags are set, so paused requests can be unblocked as soon as
possible.

Fixes: http://tracker.ceph.com/issues/6079

Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
</entry>
<entry>
<title>libceph: add function to ensure notifies are complete</title>
<updated>2013-09-09T18:15:49Z</updated>
<author>
<name>Josh Durgin</name>
<email>josh.durgin@inktank.com</email>
</author>
<published>2013-08-29T04:43:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd935f44a40f8fb02aff2cc0df2269c92422df1c'/>
<id>urn:sha1:dd935f44a40f8fb02aff2cc0df2269c92422df1c</id>
<content type='text'>
Without a way to flush the osd client's notify workqueue, a watch
event that is unregistered could continue receiving callbacks
indefinitely.

Unregistering the event simply means no new notifies are added to the
queue, but there may still be events in the queue that will call the
watch callback for the event. If the queue is flushed after the event
is unregistered, the caller can be sure no more watch callbacks will
occur for the canceled watch.

Signed-off-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
</content>
</entry>
</feed>
