<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/ceph, branch v3.4.48</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.48</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.48'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-01-17T16:51:20Z</updated>
<entry>
<title>rbd: remove linger unconditionally</title>
<updated>2013-01-17T16:51:20Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-06T15:37:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=50cda8f4439d715e080ced70d1e6d9ec07f95e6c'/>
<id>urn:sha1:50cda8f4439d715e080ced70d1e6d9ec07f95e6c</id>
<content type='text'>
In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit 61c74035626beb25a39b0273ccf7d75510bc36a1)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: don't reference req after put</title>
<updated>2013-01-17T16:51:20Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-11-29T14:37:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=02585b8bdc988de82b45088a3f2092af2e6b2816'/>
<id>urn:sha1:02585b8bdc988de82b45088a3f2092af2e6b2816</id>
<content type='text'>
In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-off-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit 7d5f24812bd182a2471cb69c1c2baf0648332e1f)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: remove 'osdtimeout' option</title>
<updated>2013-01-17T16:51:20Z</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2012-11-28T20:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=31c46473d6a31ac1948c189624b472f26a6365e9'/>
<id>urn:sha1:31c46473d6a31ac1948c189624b472f26a6365e9</id>
<content type='text'>
This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
(cherry picked from commit 83aff95eb9d60aff5497e9f44a2ae906b86d8e88)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: avoid using freed osd in __kick_osd_requests()</title>
<updated>2013-01-17T16:51:20Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-07T15:57:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3aa540b869b9bd734ec60246ae1dfe35ab3530e0'/>
<id>urn:sha1:3aa540b869b9bd734ec60246ae1dfe35ab3530e0</id>
<content type='text'>
If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit 685a7555ca69030739ddb57a47f0ea8ea80196a4)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: fix osdmap decode error paths</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2012-10-29T18:01:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9aaab33ec9b7b9cb745fc47dfe20b991ff468d72'/>
<id>urn:sha1:9aaab33ec9b7b9cb745fc47dfe20b991ff468d72</id>
<content type='text'>
Ensure that we set the err value correctly so that we do not pass a 0
value to ERR_PTR and confuse the calling code.  (In particular,
osd_client.c handle_map() will BUG(!newmap)).

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
(cherry picked from commit 0ed7285e0001b960c888e5455ae982025210ed3d)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: fix protocol feature mismatch failure path</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2012-12-28T02:27:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f5b160daddeaa55e441aca16010fab8299dfe9e'/>
<id>urn:sha1:7f5b160daddeaa55e441aca16010fab8299dfe9e</id>
<content type='text'>
We should not set con-&gt;state to CLOSED here; that happens in
ceph_fault() in the caller, where it first asserts that the state
is not yet CLOSED.  Avoids a BUG when the features don't match.

Since the fail_protocol() has become a trivial wrapper, replace
calls to it with direct calls to reset_connection().

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
(cherry picked from commit 0fa6ebc600bc8e830551aee47a0e929e818a1868)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: WARN, don't BUG on unexpected connection states</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-26T16:43:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5bd9eb5a0891df0a806f265db504661fd558529c'/>
<id>urn:sha1:5bd9eb5a0891df0a806f265db504661fd558529c</id>
<content type='text'>
A number of assertions in the ceph messenger are implemented with
BUG_ON(), killing the system if connection's state doesn't match
what's expected.  At this point our state model is (evidently) not
well understood enough for these assertions to trigger a BUG().
Convert all BUG_ON(con-&gt;state...) calls to be WARN_ON(con-&gt;state...)
so we learn about these issues without killing the machine.

We now recognize that a connection fault can occur due to a socket
closure at any time, regardless of the state of the connection.  So
there is really nothing we can assert about the state of the
connection at that point so eliminate that assertion.

Reported-by: Ugis &lt;ugis22@gmail.com&gt;
Tested-by: Ugis &lt;ugis22@gmail.com&gt;
Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit 122070a2ffc91f87fe8e8493eb0ac61986c5557c)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: always reset osds when kicking</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-26T20:31:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1cc023b2653ad66b58970460165d0859943f41e7'/>
<id>urn:sha1:1cc023b2653ad66b58970460165d0859943f41e7</id>
<content type='text'>
When ceph_osdc_handle_map() is called to process a new osd map,
kick_requests() is called to ensure all affected requests are
updated if necessary to reflect changes in the osd map.  This
happens in two cases:  whenever an incremental map update is
processed; and when a full map update (or the last one if there is
more than one) gets processed.

In the former case, the kick_requests() call is followed immediately
by a call to reset_changed_osds() to ensure any connections to osds
affected by the map change are reset.  But for full map updates
this isn't done.

Both cases should be doing this osd reset.

Rather than duplicating the reset_changed_osds() call, move it into
the end of kick_requests().

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit e6d50f67a6b1a6252a616e6e629473b5c4277218)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: move linger requests sooner in kick_requests()</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-19T21:52:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a9ded438f7c08525df2ce19e8aeb112efd3090c9'/>
<id>urn:sha1:a9ded438f7c08525df2ce19e8aeb112efd3090c9</id>
<content type='text'>
The kick_requests() function is called by ceph_osdc_handle_map()
when an osd map change has been indicated.  Its purpose is to
re-queue any request whose target osd is different from what it
was when it was originally sent.

It is structured as two loops, one for incomplete but registered
requests, and a second for handling completed linger requests.
As a special case, in the first loop if a request marked to linger
has not yet completed, it is moved from the request list to the
linger list.  This is as a quick and dirty way to have the second
loop handle sending the request along with all the other linger
requests.

Because of the way it's done now, however, this quick and dirty
solution can result in these incomplete linger requests never
getting re-sent as desired.  The problem lies in the fact that
the second loop only arranges for a linger request to be sent
if it appears its target osd has changed.  This is the proper
handling for *completed* linger requests (it avoids issuing
the same linger request twice to the same osd).

But although the linger requests added to the list in the first loop
may have been sent, they have not yet completed, so they need to be
re-sent regardless of whether their target osd has changed.

The first required fix is we need to avoid calling __map_request()
on any incomplete linger request.  Otherwise the subsequent
__map_request() call in the second loop will find the target osd
has not changed and will therefore not re-send the request.

Second, we need to be sure that a sent but incomplete linger request
gets re-sent.  If the target osd is the same with the new osd map as
it was when the request was originally sent, this won't happen.
This can be fixed through careful handling when we move these
requests from the request list to the linger list, by unregistering
the request *before* it is registered as a linger request.  This
works because a side-effect of unregistering the request is to make
the request's r_osd pointer be NULL, and *that* will ensure the
second loop actually re-sends the linger request.

Processing of such a request is done at that point, so continue with
the next one once it's been moved.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit ab60b16d3c31b9bd9fd5b39f97dc42c52a50b67d)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>libceph: register request before unregister linger</title>
<updated>2013-01-17T16:51:19Z</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2012-12-06T13:22:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0d3b9fff8bc11bc7d745c6b34c098b1c548fec65'/>
<id>urn:sha1:0d3b9fff8bc11bc7d745c6b34c098b1c548fec65</id>
<content type='text'>
In kick_requests(), we need to register the request before we
unregister the linger request.  Otherwise the unregister will
reset the request's osd pointer to NULL.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
(cherry picked from commit c89ce05e0c5a01a256100ac6a6019f276bdd1ca6)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
