<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v5.10.99</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.99</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.99'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-01-27T09:54:22Z</updated>
<entry>
<title>dm: fix alloc_dax error handling in alloc_dev</title>
<updated>2022-01-27T09:54:22Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-11-29T10:21:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dfde7afed7116374074e531dfad9919348bef5ac'/>
<id>urn:sha1:dfde7afed7116374074e531dfad9919348bef5ac</id>
<content type='text'>
[ Upstream commit d751939235b9b7bc4af15f90a3e99288a8b844a7 ]

Make sure -&gt;dax_dev is NULL on error so that the cleanup path doesn't
trip over an ERR_PTR.

Reported-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.de
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm space map common: add bounds check to sm_ll_lookup_bitmap()</title>
<updated>2022-01-27T09:54:20Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-12-10T13:49:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eaf8cffcf5d555f739d34cb09b7b4862e7bcb5ff'/>
<id>urn:sha1:eaf8cffcf5d555f739d34cb09b7b4862e7bcb5ff</id>
<content type='text'>
[ Upstream commit cba23ac158db7f3cd48a923d6861bee2eb7a2978 ]

Corrupted metadata could warrant returning error from sm_ll_lookup_bitmap().

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm btree: add a defensive bounds check to insert_at()</title>
<updated>2022-01-27T09:54:20Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-12-10T13:44:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5850bef8e955a0c4ce4653f858c8e37c585269eb'/>
<id>urn:sha1:5850bef8e955a0c4ce4653f858c8e37c585269eb</id>
<content type='text'>
[ Upstream commit 85bca3c05b6cca31625437eedf2060e846c4bbad ]

Corrupt metadata could trigger an out of bounds write.

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: revert io stats accounting</title>
<updated>2022-01-16T08:14:21Z</updated>
<author>
<name>Guoqing Jiang</name>
<email>jgq516@gmail.com</email>
</author>
<published>2021-05-25T09:46:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c39d68ab38369265a73a34d8ffb189221a6e230f'/>
<id>urn:sha1:c39d68ab38369265a73a34d8ffb189221a6e230f</id>
<content type='text'>
commit ad3fc798800fb7ca04c1dfc439dba946818048d8 upstream.

The commit 41d2d848e5c0 ("md: improve io stats accounting") could cause
double fault problem per the report [1], and also it is not correct to
change -&gt;bi_end_io if md don't own it, so let's revert it.

And io stats accounting will be replemented in later commits.

[1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t

Fixes: 41d2d848e5c0 ("md: improve io stats accounting")
Signed-off-by: Guoqing Jiang &lt;jiangguoqing@kylinos.cn&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
[GM: backport to 5.10-stable]
Signed-off-by: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm btree remove: fix use after free in rebalance_children()</title>
<updated>2021-12-22T08:30:51Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2021-11-24T17:07:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e21e6cd5eebfc929ac5fa3b97ca2d4ace3cb6a3'/>
<id>urn:sha1:0e21e6cd5eebfc929ac5fa3b97ca2d4ace3cb6a3</id>
<content type='text'>
commit 1b8d2789dad0005fd5e7d35dab26a8e1203fb6da upstream.

Move dm_tm_unlock() after dm_tm_dec().

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md: fix update super 1.0 on rdev size change</title>
<updated>2021-12-14T10:32:38Z</updated>
<author>
<name>Markus Hochholdinger</name>
<email>markus@hochholdinger.net</email>
</author>
<published>2021-11-16T10:21:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8b4264c27b821d6b3550fd67c0169cbc5549db8c'/>
<id>urn:sha1:8b4264c27b821d6b3550fd67c0169cbc5549db8c</id>
<content type='text'>
commit 55df1ce0d4e086e05a8ab20619c73c729350f965 upstream.

The superblock of version 1.0 doesn't get moved to the new position on a
device size change. This leads to a rdev without a superblock on a known
position, the raid can't be re-assembled.

The line was removed by mistake and is re-added by this patch.

Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Hochholdinger &lt;markus@hochholdinger.net&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md: update superblock after changing rdev flags in state_store</title>
<updated>2021-11-18T13:03:57Z</updated>
<author>
<name>Xiao Ni</name>
<email>xni@redhat.com</email>
</author>
<published>2021-10-13T14:59:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2338c3501726895c1657adda2308fcf9e6f17449'/>
<id>urn:sha1:2338c3501726895c1657adda2308fcf9e6f17449</id>
<content type='text'>
[ Upstream commit 8b9e2291e355a0eafdd5b1e21a94a6659f24b351 ]

When the in memory flag is changed, we need to persist the change in the
rdev superblock flags. This is needed for "writemostly" and "failfast".

Reviewed-by: Li Feng &lt;fengli@smartx.com&gt;
Signed-off-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: fix a lock order reversal in md_alloc</title>
<updated>2021-09-30T08:11:05Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-09-01T11:38:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b18ba3f477a2fdd12d2ca2e01d2bd874968714e2'/>
<id>urn:sha1:b18ba3f477a2fdd12d2ca2e01d2bd874968714e2</id>
<content type='text'>
[ Upstream commit 7df835a32a8bedf7ce88efcfa7c9b245b52ff139 ]

Commit b0140891a8cea3 ("md: Fix race when creating a new md device.")
not only moved assigning mddev-&gt;gendisk before calling add_disk, which
fixes the races described in the commit log, but also added a
mddev-&gt;open_mutex critical section over add_disk and creation of the
md kobj.  Adding a kobject after add_disk is racy vs deleting the gendisk
right after adding it, but md already prevents against that by holding
a mddev-&gt;active reference.

On the other hand taking this lock added a lock order reversal with what
is not disk-&gt;open_mutex (used to be bdev-&gt;bd_mutex when the commit was
added) for partition devices, which need that lock for the internal open
for the partition scan, and a recent commit also takes it for
non-partitioned devices, leading to further lockdep splatter.

Fixes: b0140891a8ce ("md: Fix race when creating a new md device.")
Fixes: d62633873590 ("block: support delayed holder registration")
Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Reviewed-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>treewide: Change list_sort to use const pointers</title>
<updated>2021-09-30T08:11:04Z</updated>
<author>
<name>Sami Tolvanen</name>
<email>samitolvanen@google.com</email>
</author>
<published>2021-04-08T18:28:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55e6f8b3c0f5cc600df12ddd0371d2703b910fd7'/>
<id>urn:sha1:55e6f8b3c0f5cc600df12ddd0371d2703b910fd7</id>
<content type='text'>
[ Upstream commit 4f0f586bf0c898233d8f316f471a21db2abd522d ]

list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Sami Tolvanen &lt;samitolvanen@google.com&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Tested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()</title>
<updated>2021-09-18T11:40:08Z</updated>
<author>
<name>Arne Welzel</name>
<email>arne.welzel@corelight.com</email>
</author>
<published>2021-08-13T22:40:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7509c4cb7c8050177da9ee5e053c0c3d55bb66b7'/>
<id>urn:sha1:7509c4cb7c8050177da9ee5e053c0c3d55bb66b7</id>
<content type='text'>
commit 528b16bfc3ae5f11638e71b3b63a81f9999df727 upstream.

On systems with many cores using dm-crypt, heavy spinlock contention in
percpu_counter_compare() can be observed when the page allocation limit
for a given device is reached or close to be reached. This is due
to percpu_counter_compare() taking a spinlock to compute an exact
result on potentially many CPUs at the same time.

Switch to non-exact comparison of allocated and allowed pages by using
the value returned by percpu_counter_read_positive() to avoid taking
the percpu_counter spinlock.

This may over/under estimate the actual number of allocated pages by at
most (batch-1) * num_online_cpus().

Currently, batch is bounded by 32. The system on which this issue was
first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this
change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt
allocations, this seems an acceptable error. Certainly preferred over
running into the spinlock contention.

This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs
and 192GB RAM as follows, but can be provoked on systems with less CPUs
as well.

 * Disable swap
 * Tune vm settings to promote regular writeback
     $ echo 50 &gt; /proc/sys/vm/dirty_expire_centisecs
     $ echo 25 &gt; /proc/sys/vm/dirty_writeback_centisecs
     $ echo $((128 * 1024 * 1024)) &gt; /proc/sys/vm/dirty_background_bytes

 * Create 8 dmcrypt devices based on files on a tmpfs
 * Create and mount an ext4 filesystem on each crypt devices
 * Run stress-ng --hdd 8 within one of above filesystems

Total %system usage collected from sysstat goes to ~35%. Write throughput
on the underlying loop device is ~2GB/s. perf profiling an individual
kworker kcryptd thread shows the following profile, indicating spinlock
contention in percpu_counter_compare():

    99.98%     0.00%  kworker/u193:46  [kernel.kallsyms]  [k] ret_from_fork
      |
      --ret_from_fork
        kthread
        worker_thread
        |
         --99.92%--process_one_work
            |
            |--80.52%--kcryptd_crypt
            |    |
            |    |--62.58%--mempool_alloc
            |    |  |
            |    |   --62.24%--crypt_page_alloc
            |    |     |
            |    |      --61.51%--__percpu_counter_compare
            |    |        |
            |    |         --61.34%--__percpu_counter_sum
            |    |           |
            |    |           |--58.68%--_raw_spin_lock_irqsave
            |    |           |  |
            |    |           |   --58.30%--native_queued_spin_lock_slowpath
            |    |           |
            |    |            --0.69%--cpumask_next
            |    |                |
            |    |                 --0.51%--_find_next_bit
            |    |
            |    |--10.61%--crypt_convert
            |    |          |
            |    |          |--6.05%--xts_crypt
            ...

After applying this patch and running the same test, %system usage is
lowered to ~7% and write throughput on the loop device increases
to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62%
in the profile and not hitting the percpu_counter() spinlock anymore.

    |--8.15%--mempool_alloc
    |    |
    |    |--3.93%--crypt_page_alloc
    |    |    |
    |    |     --3.75%--__alloc_pages
    |    |         |
    |    |          --3.62%--get_page_from_freelist
    |    |              |
    |    |               --3.22%--rmqueue_bulk
    |    |                   |
    |    |                    --2.59%--_raw_spin_lock
    |    |                      |
    |    |                       --2.57%--native_queued_spin_lock_slowpath
    |    |
    |     --3.05%--_raw_spin_lock_irqsave
    |               |
    |                --2.49%--native_queued_spin_lock_slowpath

Suggested-by: DJ Gregor &lt;dj@corelight.com&gt;
Reviewed-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Arne Welzel &lt;arne.welzel@corelight.com&gt;
Fixes: 5059353df86e ("dm crypt: limit the number of allocated pages")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
