<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v4.9.80</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.80</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.80'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-02-03T16:05:37Z</updated>
<entry>
<title>bcache: check return value of register_shrinker</title>
<updated>2018-02-03T16:05:37Z</updated>
<author>
<name>Michael Lyle</name>
<email>mlyle@lyle.org</email>
</author>
<published>2017-11-24T23:14:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=409982cbb5eb814532953fb996ddc27bae5282a4'/>
<id>urn:sha1:409982cbb5eb814532953fb996ddc27bae5282a4</id>
<content type='text'>
[ Upstream commit 6c4ca1e36cdc1a0a7a84797804b87920ccbebf51 ]

register_shrinker is now __must_check, so check it to kill a warning.
Caller of bch_btree_cache_alloc in super.c appropriately checks return
value so this is fully plumbed through.

This V2 fixes checkpatch warnings and improves the commit description,
as I was too hasty getting the previous version out.

Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Vojtech Pavlik &lt;vojtech@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6</title>
<updated>2018-01-23T18:57:09Z</updated>
<author>
<name>Dennis Yang</name>
<email>dennisyang@qnap.com</email>
</author>
<published>2017-12-12T10:21:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2904adc5b1c08894f954bff9e30eb5facb3b6591'/>
<id>urn:sha1:2904adc5b1c08894f954bff9e30eb5facb3b6591</id>
<content type='text'>
commit 490ae017f54e55bde382d45ea24bddfb6d1a0aaf upstream.

For btree removal, there is a corner case that a single thread
could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5)
and leads to deadlock.

A btree removal might eventually call
rebalance_children()-&gt;rebalance3() to rebalance entries of three
neighbor child nodes when shadow_spine has already acquired two
write locks. In rebalance3(), it tries to shadow and acquire the
write locks of all three child nodes. However, shadowing a child
node requires acquiring a read lock of the original child node and
a write lock of the new block. Although the read lock will be
released after block shadowing, shadowing the third child node
in rebalance3() could still take the sixth lock.
(2 write locks for shadow_spine +
 2 write locks for the first two child nodes's shadow +
 1 write lock for the last child node's shadow +
 1 read lock for the last child node)

Signed-off-by: Dennis Yang &lt;dennisyang@qnap.com&gt;
Acked-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm btree: fix serious bug in btree_split_beneath()</title>
<updated>2018-01-23T18:57:08Z</updated>
<author>
<name>Joe Thornber</name>
<email>thornber@redhat.com</email>
</author>
<published>2017-12-20T09:56:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cabf6294a6dc615324f8269b5839de2020dbdaf3'/>
<id>urn:sha1:cabf6294a6dc615324f8269b5839de2020dbdaf3</id>
<content type='text'>
commit bc68d0a43560e950850fc69b58f0f8254b28f6d6 upstream.

When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

  i) space for a new entry
  ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them.  The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children.  This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out.  This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use.  A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
  https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine.  Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.

Reported-by: Monty Pavel &lt;monty_pavel@sina.com&gt;
Signed-off-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm bufio: fix shrinker scans when (nr_to_scan &lt; retain_target)</title>
<updated>2018-01-17T08:38:49Z</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2017-12-06T17:27:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b58aa24edb621469040b0a8a47f25d91f5ee2c58'/>
<id>urn:sha1:b58aa24edb621469040b0a8a47f25d91f5ee2c58</id>
<content type='text'>
commit fbc7c07ec23c040179384a1f16b62b6030eb6bdd upstream.

When system is under memory pressure it is observed that dm bufio
shrinker often reclaims only one buffer per scan. This change fixes
the following two issues in dm bufio shrinker that cause this behavior:

1. ((nr_to_scan - freed) &lt;= retain_target) condition is used to
terminate slab scan process. This assumes that nr_to_scan is equal
to the LRU size, which might not be correct because do_shrink_slab()
in vmscan.c calculates nr_to_scan using multiple inputs.
As a result when nr_to_scan is less than retain_target (64) the scan
will terminate after the first iteration, effectively reclaiming one
buffer per scan and making scans very inefficient. This hurts vmscan
performance especially because mutex is acquired/released every time
dm_bufio_shrink_scan() is called.
New implementation uses ((LRU size - freed) &lt;= retain_target)
condition for scan termination. LRU size can be safely determined
inside __scan() because this function is called after dm_bufio_lock().

2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
determine number of freeable objects in the slab. However dm_bufio
always retains retain_target buffers in its LRU and will terminate
a scan when this mark is reached. Therefore returning the entire LRU size
from dm_bufio_shrink_count() is misleading because that does not
represent the number of freeable objects that slab will reclaim during
a scan. Returning (LRU size - retain_target) better represents the
number of freeable objects in the slab. This way do_shrink_slab()
returns 0 when (LRU size &lt; retain_target) and vmscan will not try to
scan this shrinker avoiding scans that will not reclaim any memory.

Test: tested using Android device running
&lt;AOSP&gt;/system/extras/alloc-stress that generates memory pressure
and causes intensive shrinker scans

Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>raid5: Set R5_Expanded on parity devices as well as data.</title>
<updated>2017-12-20T09:07:32Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-10-17T05:18:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f39486bd37ee3edd8658aff1226337c47e4ba32e'/>
<id>urn:sha1:f39486bd37ee3edd8658aff1226337c47e4ba32e</id>
<content type='text'>
[ Upstream commit 235b6003fb28f0dd8e7ed8fbdb088bb548291766 ]

When reshaping a fully degraded raid5/raid6 to a larger
nubmer of devices, the new device(s) are not in-sync
and so that can make the newly grown stripe appear to be
"failed".
To avoid this, we set the R5_Expanded flag to say "Even though
this device is not fully in-sync, this block is safe so
don't treat the device as failed for this stripe".
This flag is set for data devices, not not for parity devices.

Consequently, if you have a RAID6 with two devices that are partly
recovered and a spare, and start a reshape to include the spare,
then when the reshape gets past the point where the recovery was
up to, it will think the stripes are failed and will get into
an infinite loop, failing to make progress.

So when contructing parity on an EXPAND_READY stripe,
set R5_Expanded.

Reported-by: Curt &lt;lightspd@gmail.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix wrong cache_misses statistics</title>
<updated>2017-12-20T09:07:30Z</updated>
<author>
<name>tang.junhui</name>
<email>tang.junhui@zte.com.cn</email>
</author>
<published>2017-10-30T21:46:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9102ed6a5f6a53434f69264ac8457994fde14a88'/>
<id>urn:sha1:9102ed6a5f6a53434f69264ac8457994fde14a88</id>
<content type='text'>
[ Upstream commit c157313791a999646901b3e3c6888514ebc36d62 ]

Currently, Cache missed IOs are identified by s-&gt;cache_miss, but actually,
there are many situations that missed IOs are not assigned a value for
s-&gt;cache_miss in cached_dev_cache_miss(), for example, a bypassed IO
(s-&gt;iop.bypass = 1), or the cache_bio allocate failed. In these situations,
it will go to out_put or out_submit, and s-&gt;cache_miss is null, which leads
bch_mark_cache_accounting() to treat this IO as a hit IO.

[ML: applied by 3-way merge]

Signed-off-by: tang.junhui &lt;tang.junhui@zte.com.cn&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: explicitly destroy mutex while exiting</title>
<updated>2017-12-20T09:07:30Z</updated>
<author>
<name>Liang Chen</name>
<email>liangchen.linux@gmail.com</email>
</author>
<published>2017-10-30T21:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c2a0531f59c363fd5b50be46b97248091b13484b'/>
<id>urn:sha1:c2a0531f59c363fd5b50be46b97248091b13484b</id>
<content type='text'>
[ Upstream commit 330a4db89d39a6b43f36da16824eaa7a7509d34d ]

mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.

As Coly pointed out in a previous review, bcache_exit() may not be
able to handle all the references properly if userspace registers
cache and backing devices right before bch_debug_init runs and
bch_debug_init failes later. So not exposing userspace interface
until everything is ready to avoid that issue.

Signed-off-by: Liang Chen &lt;liangchen.linux@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Eric Wheeler &lt;bcache@linux.ewheeler.net&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md-cluster: free md_cluster_info if node leave cluster</title>
<updated>2017-12-20T09:07:18Z</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2017-02-24T03:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9bcd15bdfb6151e0418fa9321d4dce2b2eb0cb16'/>
<id>urn:sha1:9bcd15bdfb6151e0418fa9321d4dce2b2eb0cb16</id>
<content type='text'>
[ Upstream commit 9c8043f337f14d1743006dfc59c03e80a42e3884 ]

To avoid memory leak, we need to free the cinfo which
is allocated when node join cluster.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md: free unused memory after bitmap resize</title>
<updated>2017-12-16T15:25:47Z</updated>
<author>
<name>Zdenek Kabelac</name>
<email>zkabelac@redhat.com</email>
</author>
<published>2017-11-08T12:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b7d3f2b5dca97de98dddbd4992dbe49d5a7723fa'/>
<id>urn:sha1:b7d3f2b5dca97de98dddbd4992dbe49d5a7723fa</id>
<content type='text'>
[ Upstream commit 0868b99c214a3d55486c700de7c3f770b7243e7c ]

When bitmap is resized, the old kalloced chunks just are not released
once the resized bitmap starts to use new space.

This fixes in particular kmemleak reports like this one:

unreferenced object 0xffff8f4311e9c000 (size 4096):
  comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
  hex dump (first 32 bytes):
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
  backtrace:
    [&lt;ffffffffa69471ca&gt;] kmemleak_alloc+0x4a/0xa0
    [&lt;ffffffffa628c10e&gt;] kmem_cache_alloc_trace+0x14e/0x2e0
    [&lt;ffffffffa676cfec&gt;] bitmap_checkpage+0x7c/0x110
    [&lt;ffffffffa676d0c5&gt;] bitmap_get_counter+0x45/0xd0
    [&lt;ffffffffa676d6b3&gt;] bitmap_set_memory_bits+0x43/0xe0
    [&lt;ffffffffa676e41c&gt;] bitmap_init_from_disk+0x23c/0x530
    [&lt;ffffffffa676f1ae&gt;] bitmap_load+0xbe/0x160
    [&lt;ffffffffc04c47d3&gt;] raid_preresume+0x203/0x2f0 [dm_raid]
    [&lt;ffffffffa677762f&gt;] dm_table_resume_targets+0x4f/0xe0
    [&lt;ffffffffa6774b52&gt;] dm_resume+0x122/0x140
    [&lt;ffffffffa6779b9f&gt;] dev_suspend+0x18f/0x290
    [&lt;ffffffffa677a3a7&gt;] ctl_ioctl+0x287/0x560
    [&lt;ffffffffa677a693&gt;] dm_ctl_ioctl+0x13/0x20
    [&lt;ffffffffa62d6b46&gt;] do_vfs_ioctl+0xa6/0x750
    [&lt;ffffffffa62d7269&gt;] SyS_ioctl+0x79/0x90
    [&lt;ffffffffa6956d41&gt;] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Zdenek Kabelac &lt;zkabelac@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: recover data from backing when data is clean</title>
<updated>2017-12-09T21:01:46Z</updated>
<author>
<name>Rui Hua</name>
<email>huarui.dev@gmail.com</email>
</author>
<published>2017-11-24T23:14:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9db9b5f2b1b691c78b239a71a4b6a8d950323a78'/>
<id>urn:sha1:9db9b5f2b1b691c78b239a71a4b6a8d950323a78</id>
<content type='text'>
commit e393aa2446150536929140739f09c6ecbcbea7f0 upstream.

When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s-&gt;iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)

It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.

Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happened, until the whole cache
device is clean, because the condition
(s-&gt;recoverable &amp;&amp; (dc &amp;&amp; !atomic_read(&amp;dc-&gt;has_dirty))) is false in
cached_dev_read_error(). In this situation, the s-&gt;iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.

In this patch, we use s-&gt;read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.

[edited by mlyle to fix up whitespace, commit log title, comment
spelling]

Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean")
Signed-off-by: Hua Rui &lt;huarui.dev@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
