<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/ceph, branch v4.4.97</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.97</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.97'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-04-08T07:53:30Z</updated>
<entry>
<title>libceph: force GFP_NOIO for socket allocations</title>
<updated>2017-04-08T07:53:30Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-03-21T12:44:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba46d8fab00a8e1538df241681d9161c8ec85778'/>
<id>urn:sha1:ba46d8fab00a8e1538df241681d9161c8ec85778</id>
<content type='text'>
commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream.

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [&lt;ffffffff816dd629&gt;] schedule+0x29/0x70
    [&lt;ffffffff816e066d&gt;] schedule_timeout+0x1bd/0x200
    [&lt;ffffffff81093ffc&gt;] ? ttwu_do_wakeup+0x2c/0x120
    [&lt;ffffffff81094266&gt;] ? ttwu_do_activate.constprop.135+0x66/0x70
    [&lt;ffffffff816deb5f&gt;] wait_for_completion+0xbf/0x180
    [&lt;ffffffff81097cd0&gt;] ? try_to_wake_up+0x390/0x390
    [&lt;ffffffff81086335&gt;] flush_work+0x165/0x250
    [&lt;ffffffff81082940&gt;] ? worker_detach_from_pool+0xd0/0xd0
    [&lt;ffffffffa03b65b1&gt;] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [&lt;ffffffff816d6b42&gt;] ? __slab_free+0xee/0x234
    [&lt;ffffffffa03b4b1d&gt;] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [&lt;ffffffff811adc1e&gt;] ? lookup_page_cgroup_used+0xe/0x30
    [&lt;ffffffffa039a723&gt;] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [&lt;ffffffffa03b4dcf&gt;] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [&lt;ffffffffa039a723&gt;] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [&lt;ffffffffa03a62c6&gt;] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [&lt;ffffffff810aa250&gt;] ? wake_atomic_t_function+0x40/0x40
    [&lt;ffffffffa039a723&gt;] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [&lt;ffffffffa039ac07&gt;] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [&lt;ffffffffa039bb13&gt;] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [&lt;ffffffffa03ab745&gt;] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [&lt;ffffffff811c0c18&gt;] super_cache_scan+0x178/0x180
    [&lt;ffffffff8115912e&gt;] shrink_slab_node+0x14e/0x340
    [&lt;ffffffff811afc3b&gt;] ? mem_cgroup_iter+0x16b/0x450
    [&lt;ffffffff8115af70&gt;] shrink_slab+0x100/0x140
    [&lt;ffffffff8115e425&gt;] do_try_to_free_pages+0x335/0x490
    [&lt;ffffffff8115e7f9&gt;] try_to_free_pages+0xb9/0x1f0
    [&lt;ffffffff816d56e4&gt;] ? __alloc_pages_direct_compact+0x69/0x1be
    [&lt;ffffffff81150cba&gt;] __alloc_pages_nodemask+0x69a/0xb40
    [&lt;ffffffff8119743e&gt;] alloc_pages_current+0x9e/0x110
    [&lt;ffffffff811a0ac5&gt;] new_slab+0x2c5/0x390
    [&lt;ffffffff816d71c4&gt;] __slab_alloc+0x33b/0x459
    [&lt;ffffffff815b906d&gt;] ? sock_alloc_inode+0x2d/0xd0
    [&lt;ffffffff8164bda1&gt;] ? inet_sendmsg+0x71/0xc0
    [&lt;ffffffff815b906d&gt;] ? sock_alloc_inode+0x2d/0xd0
    [&lt;ffffffff811a21f2&gt;] kmem_cache_alloc+0x1a2/0x1b0
    [&lt;ffffffff815b906d&gt;] sock_alloc_inode+0x2d/0xd0
    [&lt;ffffffff811d8566&gt;] alloc_inode+0x26/0xa0
    [&lt;ffffffff811da04a&gt;] new_inode_pseudo+0x1a/0x70
    [&lt;ffffffff815b933e&gt;] sock_alloc+0x1e/0x80
    [&lt;ffffffff815ba855&gt;] __sock_create+0x95/0x220
    [&lt;ffffffff815baa04&gt;] sock_create_kern+0x24/0x30
    [&lt;ffffffffa04794d9&gt;] con_work+0xef9/0x2050 [libceph]
    [&lt;ffffffffa04aa9ec&gt;] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [&lt;ffffffff81084c19&gt;] process_one_work+0x159/0x4f0
    [&lt;ffffffff8108561b&gt;] worker_thread+0x11b/0x530
    [&lt;ffffffff81085500&gt;] ? create_worker+0x1d0/0x1d0
    [&lt;ffffffff8108b6f9&gt;] kthread+0xc9/0xe0
    [&lt;ffffffff8108b630&gt;] ? flush_kthread_worker+0x90/0x90
    [&lt;ffffffff816e1b98&gt;] ret_from_fork+0x58/0x90
    [&lt;ffffffff8108b630&gt;] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov &lt;wintchester@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: don't set weight to IN when OSD is destroyed</title>
<updated>2017-03-30T07:35:18Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-03-01T16:33:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48da8f817b9db7909e5758257bdc84a6c611d99a'/>
<id>urn:sha1:48da8f817b9db7909e5758257bdc84a6c611d99a</id>
<content type='text'>
commit b581a5854eee4b7851dedb0f8c2ceb54fb902c06 upstream.

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs.  Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: verify authorize reply on connect</title>
<updated>2017-01-09T07:07:52Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-12-02T15:35:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b66e3126569e25e11bc3913e41f6f39445508338'/>
<id>urn:sha1:b66e3126569e25e11bc3913e41f6f39445508338</id>
<content type='text'>
commit 5c056fdc5b474329037f2aa18401bd73033e0ce0 upstream.

After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.

AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd1bba ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").

The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down.  Pass 0 to facilitate backporting, and kill
it in the next commit.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: apply new_state before new_up_client on incrementals</title>
<updated>2016-08-10T09:49:29Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-07-19T01:50:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=032951d32c13b7564dfba82758260cb7aa1149d2'/>
<id>urn:sha1:032951d32c13b7564dfba82758260cb7aa1149d2</id>
<content type='text'>
commit 930c532869774ebf8af9efe9484c597f896a7d46 upstream.

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i &lt; pg-&gt;pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg-&gt;pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp-&gt;osds[temp-&gt;size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Josh Durgin &lt;jdurgin@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: don't spam dmesg with stray reply warnings</title>
<updated>2016-03-03T23:07:26Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-02-19T10:38:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=01c3c0f921c8a2743e3d066108081c14618ee98c'/>
<id>urn:sha1:01c3c0f921c8a2743e3d066108081c14618ee98c</id>
<content type='text'>
commit cd8140c673d9ba9be3591220e1b2226d9e1e40d3 upstream.

Commit d15f9d694b77 ("libceph: check data_len in -&gt;alloc_msg()")
mistakenly bumped the log level on the "tid %llu unknown, skipping"
message.  Turn it back into a dout() - stray replies are perfectly
normal when OSDs flap, crash, get killed for testing purposes, etc.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: use the right footer size when skipping a message</title>
<updated>2016-03-03T23:07:26Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-02-19T10:38:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10dada9dad8fbc36840ef5266419bb0fce5945a0'/>
<id>urn:sha1:10dada9dad8fbc36840ef5266419bb0fce5945a0</id>
<content type='text'>
commit dbc0d3caff5b7591e0cf8e34ca686ca6f4479ee1 upstream.

ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: don't bail early from try_read() when skipping a message</title>
<updated>2016-03-03T23:07:26Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-02-17T19:04:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=50c6a283a713c62e6430e6dcc27ecaa91c46ba80'/>
<id>urn:sha1:50c6a283a713c62e6430e6dcc27ecaa91c46ba80</id>
<content type='text'>
commit e7a88e82fe380459b864e05b372638aeacb0f52d upstream.

The contract between try_read() and try_write() is that when called
each processes as much data as possible.  When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more.  try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.

Reported-by: Varada Kari &lt;Varada.Kari@sandisk.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Tested-by: Varada Kari &lt;Varada.Kari@sandisk.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>libceph: fix ceph_msg_revoke()</title>
<updated>2016-03-03T23:07:26Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-12-28T10:18:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee834473805f5fd77a3d2625a40552159641029e'/>
<id>urn:sha1:ee834473805f5fd77a3d2625a40552159641029e</id>
<content type='text'>
commit 67645d7619738e51c668ca69f097cb90b5470422 upstream.

There are a number of problems with revoking a "was sending" message:

(1) We never make any attempt to revoke data - only kvecs contibute to
con-&gt;out_skip.  However, once the header (envelope) is written to the
socket, our peer learns data_len and sets itself to expect at least
data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
is called while the messenger is sending message's data portion,
anything we send after that call is counted by the OSD towards the now
revoked message's data portion.  The effects vary, the most common one
is the eventual hang - higher layers get stuck waiting for the reply to
the message that was sent out after ceph_msg_revoke() returned and
treated by the OSD as a bunch of data bytes.  This is what Matt ran
into.

(2) Flat out zeroing con-&gt;out_kvec_bytes worth of bytes to handle kvecs
is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
while the messenger is sending the header, we will get a connection
reset, either due to a bad tag (0 is not a valid tag) or a bad header
CRC, which kind of defeats the purpose of revoke.  Currently the kernel
client refuses to work with header CRCs disabled, but that will likely
change in the future, making this even worse.

(3) con-&gt;out_skip is not reset on connection reset, leading to one or
more spurious connection resets if we happen to get a real one between
con-&gt;out_skip is set in ceph_msg_revoke() and before it's cleared in
write_partial_skip().

Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
zero the tag or the header, i.e. send out tag+header regardless of when
ceph_msg_revoke() is called.  That way the header is always correct, no
unnecessary resets are induced and revoke stands ready for disabled
CRCs.  Since ceph_msg_revoke() rips out con-&gt;out_msg, introduce a new
"message out temp" and copy the header into it before sending.

Reported-by: Matt Conner &lt;matt.conner@keepertech.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Tested-by: Matt Conner &lt;matt.conner@keepertech.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client</title>
<updated>2015-11-13T17:24:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-13T17:24:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ca4ba96e02e932a0c9997a40fd51253b5b2d0f9d'/>
<id>urn:sha1:ca4ba96e02e932a0c9997a40fd51253b5b2d0f9d</id>
<content type='text'>
Pull Ceph updates from Sage Weil:
 "There are several patches from Ilya fixing RBD allocation lifecycle
  issues, a series adding a nocephx_sign_messages option (and associated
  bug fixes/cleanups), several patches from Zheng improving the
  (directory) fsync behavior, a big improvement in IO for direct-io
  requests when striping is enabled from Caifeng, and several other
  small fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: clear msg-&gt;con in ceph_msg_release() only
  libceph: add nocephx_sign_messages option
  libceph: stop duplicating client fields in messenger
  libceph: drop authorizer check from cephx msg signing routines
  libceph: msg signing callouts don't need con argument
  libceph: evaluate osd_req_op_data() arguments only once
  ceph: make fsync() wait unsafe requests that created/modified inode
  ceph: add request to i_unsafe_dirops when getting unsafe reply
  libceph: introduce ceph_x_authorizer_cleanup()
  ceph: don't invalidate page cache when inode is no longer used
  rbd: remove duplicate calls to rbd_dev_mapping_clear()
  rbd: set device_type::release instead of device::release
  rbd: don't free rbd_dev outside of the release callback
  rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
  libceph: use local variable cursor instead of &amp;msg-&gt;cursor
  libceph: remove con argument in handle_reply()
  ceph: combine as many iovec as possile into one OSD request
  ceph: fix message length computation
  ceph: fix a comment typo
  rbd: drop null test before destroy functions
</content>
</entry>
<entry>
<title>Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security</title>
<updated>2015-11-05T23:32:38Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-05T23:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1873499e13648a2dd01a394ed3217c9290921b3d'/>
<id>urn:sha1:1873499e13648a2dd01a394ed3217c9290921b3d</id>
<content type='text'>
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
</content>
</entry>
</feed>
