<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v4.9.66</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.66</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.66'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-11-30T08:39:03Z</updated>
<entry>
<title>bcache: check ca-&gt;alloc_thread initialized before wake up it</title>
<updated>2017-11-30T08:39:03Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2017-10-13T23:35:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=770e10817e0980b61fa04f99432b1482242d65b4'/>
<id>urn:sha1:770e10817e0980b61fa04f99432b1482242d65b4</id>
<content type='text'>
commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.

In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.

There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.

The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca-&gt;alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca-&gt;alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca-&gt;alloc_thread, checking whether it is allocated, and only
wake up ca-&gt;alloc_thread when it is not NULL.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reported-by: Jorg Bornschein &lt;jb@capsec.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: fix race between dm_get_from_kobject() and __dm_destroy()</title>
<updated>2017-11-30T08:39:02Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2017-11-01T07:42:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1cd9686e0a3b5b5a09a2025c21cd4d92e8db0e1f'/>
<id>urn:sha1:1cd9686e0a3b5b5a09a2025c21cd4d92e8db0e1f</id>
<content type='text'>
commit b9a41d21dceadf8104812626ef85dc56ee8a60ed upstream.

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [&lt;ffffffff81649e8b&gt;] dm_get_from_kobject+0x34/0x3a
     [&lt;ffffffff81650ef1&gt;] dm_attr_show+0x2b/0x5e
     [&lt;ffffffff817b46d1&gt;] ? mutex_lock+0x26/0x44
     [&lt;ffffffff811df7f5&gt;] sysfs_kf_seq_show+0x83/0xcf
     [&lt;ffffffff811de257&gt;] kernfs_seq_show+0x23/0x25
     [&lt;ffffffff81199118&gt;] seq_read+0x16f/0x325
     [&lt;ffffffff811de994&gt;] kernfs_fop_read+0x3a/0x13f
     [&lt;ffffffff8117b625&gt;] __vfs_read+0x26/0x9d
     [&lt;ffffffff8130eb59&gt;] ? security_file_permission+0x3c/0x44
     [&lt;ffffffff8117bdb8&gt;] ? rw_verify_area+0x83/0xd9
     [&lt;ffffffff8117be9d&gt;] vfs_read+0x8f/0xcf
     [&lt;ffffffff81193e34&gt;] ? __fdget_pos+0x12/0x41
     [&lt;ffffffff8117c686&gt;] SyS_read+0x4b/0x76
     [&lt;ffffffff817b606e&gt;] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING &amp; DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING &amp; DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md-&gt;open_count.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm: allocate struct mapped_device with kvzalloc</title>
<updated>2017-11-30T08:39:02Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-10-31T23:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=67246fb9449a0cb060a7bcac72b5169fb4fb9f75'/>
<id>urn:sha1:67246fb9449a0cb060a7bcac72b5169fb4fb9f75</id>
<content type='text'>
commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream.

The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.

In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.

Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm bufio: fix integer overflow when limiting maximum cache size</title>
<updated>2017-11-30T08:39:02Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-11-16T00:38:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6609a3cdfca457d4783b4e18fc91c462353927f4'/>
<id>urn:sha1:6609a3cdfca457d4783b4e18fc91c462353927f4</id>
<content type='text'>
commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream.

The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression

    (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100

overflows, causing the wrong result to be computed.  For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem).  This causes
severe performance problems for dm-verity users on affected systems.

Fix this by using mult_frac() to correctly multiply by a percentage.  Do
this for all places in dm-bufio that multiply by a percentage.  Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.

Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset")
Fixes: 95d402f057f2 ("dm: add bufio")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md/linear: shutup lockdep warnning</title>
<updated>2017-10-21T15:21:35Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-02-21T19:57:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf368c29f5ac57c798eb99c1fba04314588e8566'/>
<id>urn:sha1:cf368c29f5ac57c798eb99c1fba04314588e8566</id>
<content type='text'>
[ Upstream commit d939cdfde34f50b95254b375f498447c82190b3e ]

Commit 03a9e24(md linear: fix a race between linear_add() and
linear_congested()) introduces the warnning.

Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md/raid10: submit bio directly to replacement disk</title>
<updated>2017-10-08T08:26:11Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-02-23T20:26:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4131c889c27843199874d7f2ba3442190cce2b41'/>
<id>urn:sha1:4131c889c27843199874d7f2ba3442190cce2b41</id>
<content type='text'>
[ Upstream commit 6d399783e9d4e9bd44931501948059d24ad96ff8 ]

Commit 57c67df(md/raid10: submit IO from originating thread instead of
md thread) submits bio directly for normal disks but not for replacement
disks. There is no point we shouldn't do this for replacement disks.

Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list</title>
<updated>2017-10-05T07:43:59Z</updated>
<author>
<name>Dennis Yang</name>
<email>dennisyang@qnap.com</email>
</author>
<published>2017-09-06T03:02:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49c2b839b743dfd8e3b6332494eba00ef47389a3'/>
<id>urn:sha1:49c2b839b743dfd8e3b6332494eba00ef47389a3</id>
<content type='text'>
commit 184a09eb9a2fe425e49c9538f1604b05ed33cfef upstream.

In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
set, it indicates that this stripe_head is already in the raid5_plug_cb
list and release_stripe() would be called instead to drop a reference
count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
stripe_head and it will get queued into the raid5_plug_cb list.

Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
A stripe could be re-added to plug list while it is still on that list
in the following situation. If stripe_head A is added to another
stripe_head B's batch list, in this case A will have its
batch_head != NULL and be added into the plug list. After that,
stripe_head B gets handled and called break_stripe_batch_list() to
reset all the batched stripe_head(including A which is still on
the plug list)'s state and reset their batch_head to NULL.
Before the plug list gets processed, if there is another write request
comes in and get stripe_head A, A will have its batch_head == NULL
(cleared by calling break_stripe_batch_list() on B) and be added to
plug list once again.

Signed-off-by: Dennis Yang &lt;dennisyang@qnap.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md/raid5: fix a race condition in stripe batch</title>
<updated>2017-10-05T07:43:59Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-08-25T17:40:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=648798cc2fd7d748573ba760a66cfa7b561abe77'/>
<id>urn:sha1:648798cc2fd7d748573ba760a66cfa7b561abe77</id>
<content type='text'>
commit 3664847d95e60a9a943858b7800f8484669740fc upstream.

We have a race condition in below scenario, say have 3 continuous stripes, sh1,
sh2 and sh3, sh1 is the stripe_head of sh2 and sh3:

CPU1				CPU2				CPU3
handle_stripe(sh3)
				stripe_add_to_batch_list(sh3)
				-&gt; lock(sh2, sh3)
				-&gt; lock batch_lock(sh1)
				-&gt; add sh3 to batch_list of sh1
				-&gt; unlock batch_lock(sh1)
								clear_batch_ready(sh1)
								-&gt; lock(sh1) and batch_lock(sh1)
								-&gt; clear STRIPE_BATCH_READY for all stripes in batch_list
								-&gt; unlock(sh1) and batch_lock(sh1)
-&gt;clear_batch_ready(sh3)
--&gt;test_and_clear_bit(STRIPE_BATCH_READY, sh3)
---&gt;return 0 as sh-&gt;batch == NULL
				-&gt; sh3-&gt;batch_head = sh1
				-&gt; unlock (sh2, sh3)

In CPU1, handle_stripe will continue handle sh3 even it's in batch stripe list
of sh1. By moving sh3-&gt;batch_head assignment in to batch_lock, we make it
impossible to clear STRIPE_BATCH_READY before batch_head is set.

Thanks Stephane for helping debug this tricky issue.

Reported-and-tested-by: Stephane Thiell &lt;sthiell@stanford.edu&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: fix bch_hprint crash and improve output</title>
<updated>2017-09-27T12:39:25Z</updated>
<author>
<name>Michael Lyle</name>
<email>mlyle@lyle.org</email>
</author>
<published>2017-09-06T06:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08f75f2c525d786b348dca568761dfde8d6b0d5c'/>
<id>urn:sha1:08f75f2c525d786b348dca568761dfde8d6b0d5c</id>
<content type='text'>
commit 9276717b9e297a62d1151a43d1cd286213f68eb7 upstream.

Most importantly, solve a crash where %llu was used to format signed
numbers.  This would cause a buffer overflow when reading sysfs
writeback_rate_debug, as only 20 bytes were allocated for this and
%llu writes 20 characters plus a null.

Always use the units mechanism rather than having different output
paths for simplicity.

Also, correct problems with display output where 1.10 was a larger
number than 1.09, by multiplying by 10 and then dividing by 1024 instead
of dividing by 100.  (Remainders of &gt;= 1000 would print as .10).

Minor changes: Always display the decimal point instead of trying to
omit it based on number of digits shown.  Decide what units to use
based on 1000 as a threshold, not 1024 (in other words, always print
at most 3 digits before the decimal point).

Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reported-by: Dmitry Yu Okunev &lt;dyokunev@ut.mephi.ru&gt;
Acked-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: fix for gc and write-back race</title>
<updated>2017-09-27T12:39:25Z</updated>
<author>
<name>Tang Junhui</name>
<email>tang.junhui@zte.com.cn</email>
</author>
<published>2017-09-06T06:25:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57aa1a6967b28431d62beec299125a969a469f3f'/>
<id>urn:sha1:57aa1a6967b28431d62beec299125a969a469f3f</id>
<content type='text'>
commit 9baf30972b5568d8b5bc8b3c46a6ec5b58100463 upstream.

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread                               write-back thread
|                                       |bch_writeback_thread()
|bch_gc_thread()                        |
|                                       |==&gt;read_dirty()
|==&gt;bch_btree_gc()                      |
|==&gt;btree_root() //get btree root       |
|                //node write locker    |
|==&gt;bch_btree_gc_root()                 |
|                                       |==&gt;read_dirty_submit()
|                                       |==&gt;write_dirty()
|                                       |==&gt;continue_at(cl,
|                                       |               write_dirty_finish,
|                                       |               system_wq);
|                                       |==&gt;write_dirty_finish()//excute
|                                       |               //in system_wq
|                                       |==&gt;bch_btree_insert()
|                                       |==&gt;bch_btree_map_leaf_nodes()
|                                       |==&gt;__bch_btree_map_nodes()
|                                       |==&gt;btree_root //try to get btree
|                                       |              //root node read
|                                       |              //lock
|                                       |-----stuck here
|==&gt;bch_btree_set_root()
|==&gt;bch_journal_meta()
|==&gt;bch_journal()
|==&gt;journal_try_write()
|==&gt;journal_write_unlocked() //journal_full(&amp;c-&gt;journal)
|                            //condition satisfied
|==&gt;continue_at(cl, journal_write, system_wq); //try to excute
|                               //journal_write in system_wq
|                               //but work queue is excuting
|                               //write_dirty_finish()
|==&gt;closure_sync(); //wait journal_write execute
|                   //over and wake up gc,
|-------------stuck here
|==&gt;release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

(Commit log re-organized by Coly Li to pass checkpatch.pl checking)

Signed-off-by: Tang Junhui &lt;tang.junhui@zte.com.cn&gt;
Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
