user/sven/linux.git/include/linux/ceph/osd_client.h, branch v3.16.1

libceph: bump CEPH_OSD_MAX_OP to 3

2014-04-03T02:33:52Z

Our longest osd request now contains 3 ops: copyup+hint+write. Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was hard-coded to 2. Fix it, and switch to rbd_assert while at it. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Reviewed-by: Alex Elder

libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-04-03T02:33:51Z

This is primarily for rbd's benefit and is supposed to combat fragmentation: "... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit." SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Reviewed-by: Alex Elder

libceph: encode CEPH_OSD_OP_FLAG_* op flags

2014-04-03T02:33:51Z

Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Reviewed-by: Alex Elder

libceph: follow redirect replies from osds

2014-01-27T21:57:53Z

Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034. v1 (current) version of redirect reply consists of oloc and oid, which expands to pool, key, nspace, hash and oid. However, server-side code that would populate anything other than pool doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with "corrupt osd_op_reply ..." message. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil

libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}

2014-01-27T21:57:49Z

Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil

libceph: introduce and start using oid abstraction

2014-01-27T21:57:28Z

In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names around, introduce struct ceph_object_id (oid) and a couple helpers to facilitate those copies and encapsulate the fact that object name is not necessarily a NUL-terminated string. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil

libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN

2014-01-27T21:57:24Z

In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil

libceph: start using oloc abstraction

2014-01-27T21:57:03Z

Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that userspace oloc currently consists of pool, key, nspace and hash fields, while this one contains only a pool. This is OK, because at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them. This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil

libceph: block I/O when PAUSE or FULL osd map flags are set

2013-12-13T17:13:29Z

The PAUSEWR and PAUSERD flags are meant to stop the cluster from processing writes and reads, respectively. The FULL flag is set when the cluster determines that it is out of space, and will no longer process writes. PAUSEWR and PAUSERD are purely client-side settings already implemented in userspace clients. The osd does nothing special with these flags. When the FULL flag is set, however, the osd responds to all writes with -ENOSPC. For cephfs, this makes sense, but for rbd the block layer translates this into EIO. If a cluster goes from full to non-full quickly, a filesystem on top of rbd will not behave well, since some writes succeed while others get EIO. Fix this by blocking any writes when the FULL flag is set in the osd client. This is the same strategy used by userspace, so apply it by default. A follow-on patch makes this configurable. __map_request() is called to re-target osd requests in case the available osds changed. Add a paused field to a ceph_osd_request, and set it whenever an appropriate osd map flag is set. Avoid queueing paused requests in __map_request(), but force them to be resent if they become unpaused. Also subscribe to the next osd map from the monitor if any of these flags are set, so paused requests can be unblocked as soon as possible. Fixes: http://tracker.ceph.com/issues/6079 Reviewed-by: Sage Weil Signed-off-by: Josh Durgin

libceph: add function to ensure notifies are complete

2013-09-09T18:15:49Z

Without a way to flush the osd client's notify workqueue, a watch event that is unregistered could continue receiving callbacks indefinitely. Unregistering the event simply means no new notifies are added to the queue, but there may still be events in the queue that will call the watch callback for the event. If the queue is flushed after the event is unregistered, the caller can be sure no more watch callbacks will occur for the canceled watch. Signed-off-by: Josh Durgin Reviewed-by: Sage Weil Reviewed-by: Alex Elder