<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/block/loop.h, branch v5.15.27</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.27</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.27'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-09-04T04:14:40Z</updated>
<entry>
<title>loop: reduce the loop_ctl_mutex scope</title>
<updated>2021-09-04T04:14:40Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@i-love.sakura.ne.jp</email>
</author>
<published>2021-09-02T00:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1c500ad706383f1a6609e63d0b5d1723fd84dab9'/>
<id>urn:sha1:1c500ad706383f1a6609e63d0b5d1723fd84dab9</id>
<content type='text'>
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit a160c6159d4a0cf8 ("block: add an optional probe callback to
major_names") is calling the module's probe function with major_names_lock
held.

Fortunately, since commit 990e78116d38059c ("block: loop: fix deadlock
between open and remove") stopped holding loop_ctl_mutex in lo_open(),
current role of loop_ctl_mutex is to serialize access to loop_index_idr
and loop_add()/loop_remove(); in other words, management of id for IDR.
To avoid holding loop_ctl_mutex during whole add/remove operation, use
a bool flag to indicate whether the loop device is ready for use.

loop_unregister_transfer() which is called from cleanup_cryptoloop()
currently has possibility of use-after-free problem due to lack of
serialization between kfree() from loop_remove() from loop_control_remove()
and mutex_lock() from unregister_transfer_cb(). But since lo-&gt;lo_encryption
should be already NULL when this function is called due to module unload,
and commit 222013f9ac30b9ce ("cryptoloop: add a deprecation warning")
indicates that we will remove this function shortly, this patch updates
this function to emit warning instead of checking lo-&gt;lo_encryption.

Holding loop_ctl_mutex in loop_exit() is pointless, for all users must
close /dev/loop-control and /dev/loop$num (in order to drop module's
refcount to 0) before loop_exit() starts, and nobody can open
/dev/loop-control or /dev/loop$num afterwards.

Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1]
Reported-by: syzbot &lt;syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com&gt;
Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/adb1e792-fc0e-ee81-7ea0-0906fc36419d@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>loop: charge i/o to mem and blk cg</title>
<updated>2021-06-29T17:53:50Z</updated>
<author>
<name>Dan Schatzberg</name>
<email>schatzberg.dan@gmail.com</email>
</author>
<published>2021-06-29T02:38:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c74d40e8b5e2ac5eee1ca45b12d3e174915f1d88'/>
<id>urn:sha1:c74d40e8b5e2ac5eee1ca45b12d3e174915f1d88</id>
<content type='text'>
The current code only associates with the existing blkcg when aio is used
to access the backing file.  This patch covers all types of i/o to the
backing file and also associates the memcg so if the backing file is on
tmpfs, memory is charged appropriately.

This patch also exports cgroup_get_e_css and int_active_memcg so it can be
used by the loop module.

Link: https://lkml.kernel.org/r/20210610173944.1203706-4-schatzberg.dan@gmail.com
Signed-off-by: Dan Schatzberg &lt;schatzberg.dan@gmail.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>loop: use worker per cgroup instead of kworker</title>
<updated>2021-06-29T17:53:50Z</updated>
<author>
<name>Dan Schatzberg</name>
<email>schatzberg.dan@gmail.com</email>
</author>
<published>2021-06-29T02:38:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=87579e9b7d8dc36e7cfc40c03f1ae5634e16e2c5'/>
<id>urn:sha1:87579e9b7d8dc36e7cfc40c03f1ae5634e16e2c5</id>
<content type='text'>
Patch series "Charge loop device i/o to issuing cgroup", v14.

The loop device runs all i/o to the backing file on a separate kworker
thread which results in all i/o being charged to the root cgroup.  This
allows a loop device to be used to trivially bypass resource limits and
other policy.  This patch series fixes this gap in accounting.

A simple script to demonstrate this behavior on cgroupv2 machine:

'''
#!/bin/bash
set -e

CGROUP=/sys/fs/cgroup/test.slice
LOOP_DEV=/dev/loop0

if [[ ! -d $CGROUP ]]
then
    sudo mkdir $CGROUP
fi

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit to tmpfs -&gt; OOM kill
sudo unshare -m bash -c "
echo \$\$ &gt; $CGROUP/cgroup.procs;
echo 0 &gt; $CGROUP/memory.swap.max;
echo 64M &gt; $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
dd if=/dev/zero of=/tmp/file bs=1M count=256" || true

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit through loopback
# device -&gt; no OOM kill
sudo unshare -m bash -c "
echo \$\$ &gt; $CGROUP/cgroup.procs;
echo 0 &gt; $CGROUP/memory.swap.max;
echo 64M &gt; $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
truncate -s 512m /tmp/backing_file
losetup $LOOP_DEV /tmp/backing_file
dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
losetup -D $LOOP_DEV" || true

grep oom_kill $CGROUP/memory.events
'''

Naively charging cgroups could result in priority inversions through the
single kworker thread in the case where multiple cgroups are
reading/writing to the same loop device.  This patch series does some
minor modification to the loop driver so that each cgroup can make forward
progress independently to avoid this inversion.

With this patch series applied, the above script triggers OOM kills when
writing through the loop device as expected.

This patch (of 3):

Existing uses of loop device may have multiple cgroups reading/writing to
the same device.  Simply charging resources for I/O to the backing file
could result in priority inversion where one cgroup gets synchronously
blocked, holding up all other I/O to the loop device.

In order to avoid this priority inversion, we use a single workqueue where
each work item is a "struct loop_worker" which contains a queue of struct
loop_cmds to issue.  The loop device maintains a tree mapping blk css_id
-&gt; loop_worker.  This allows each cgroup to independently make forward
progress issuing I/O to the backing file.

There is also a single queue for I/O associated with the rootcg which can
be used in cases of extreme memory shortage where we cannot allocate a
loop_worker.

The locking for the tree and queues is fairly heavy handed - we acquire a
per-loop-device spinlock any time either is accessed.  The existing
implementation serializes all I/O through a single thread anyways, so I
don't believe this is any worse.

[colin.king@canonical.com: fixes]

Link: https://lkml.kernel.org/r/20210610173944.1203706-1-schatzberg.dan@gmail.com
Link: https://lkml.kernel.org/r/20210610173944.1203706-2-schatzberg.dan@gmail.com
Signed-off-by: Dan Schatzberg &lt;schatzberg.dan@gmail.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Chris Down &lt;chris@chrisdown.name&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: loop: fix deadlock between open and remove</title>
<updated>2021-06-11T17:50:54Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-06-05T14:09:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=990e78116d38059c9306cf0560c1c4ed1cf358d3'/>
<id>urn:sha1:990e78116d38059c9306cf0560c1c4ed1cf358d3</id>
<content type='text'>
Commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in
del_gendisk") adds disk-&gt;part0-&gt;bd_mutex in del_gendisk(), this way
causes the following AB/BA deadlock between removing loop and opening
loop:

 1) loop_control_ioctl(LOOP_CTL_REMOVE)
     -&gt; mutex_lock(&amp;loop_ctl_mutex)
     -&gt; del_gendisk
         -&gt; mutex_lock(&amp;disk-&gt;part0-&gt;bd_mutex)

 2) blkdev_get_by_dev
     -&gt; mutex_lock(&amp;disk-&gt;part0-&gt;bd_mutex)
     -&gt; lo_open
         -&gt; mutex_lock(&amp;loop_ctl_mutex)

Add a new Lo_deleting state to remove the need for clearing
-&gt;private_data and thus holding loop_ctl_mutex in the ioctl
LOOP_CTL_REMOVE path.

Based on an analysis and earlier patch from
Ming Lei &lt;ming.lei@redhat.com&gt;.

Reported-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20210605140950.5800-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>loop: scale loop device by introducing per device lock</title>
<updated>2021-01-26T20:08:54Z</updated>
<author>
<name>Pavel Tatashin</name>
<email>pasha.tatashin@soleen.com</email>
</author>
<published>2021-01-26T14:46:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6cc8e7430801fa238bd7d3acae1eb406c6e02fe1'/>
<id>urn:sha1:6cc8e7430801fa238bd7d3acae1eb406c6e02fe1</id>
<content type='text'>
Currently, loop device has only one global lock: loop_ctl_mutex.

This becomes hot in scenarios where many loop devices are used.

Scale it by introducing per-device lock: lo_mutex that protects
modifications of all fields in struct loop_device.

Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup,
loop_add.

The new lock ordering requirement is that loop_ctl_mutex must be taken
before lo_mutex.

Signed-off-by: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Reviewed-by: Tyler Hicks &lt;tyhicks@linux.microsoft.com&gt;
Reviewed-by: Petr Vorel &lt;pvorel@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/loop: Use global lock for ioctl() operation.</title>
<updated>2018-11-08T13:30:11Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2018-11-08T13:01:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=310ca162d779efee8a2dc3731439680f3e9c1e86'/>
<id>urn:sha1:310ca162d779efee8a2dc3731439680f3e9c1e86</id>
<content type='text'>
syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo-&gt;lo_ctl_mutex locks.

Since ioctl() request on loop devices is not frequent operation, we don't
need fine grained locking. Let's use global lock in order to allow safe
traversal at loop_validate_file().

Note that syzbot is also reporting circular locking dependency between
bdev-&gt;bd_mutex and lo-&gt;lo_ctl_mutex [2] which is caused by calling
blkdev_reread_part() with lock held. This patch does not address it.

[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
[2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Reported-by: syzbot &lt;syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>loop: remember whether sysfs_create_group() was done</title>
<updated>2018-05-07T21:26:36Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2018-05-04T16:58:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3349b6b3c373ac1fbfb040b810fcee5e2adc7e0'/>
<id>urn:sha1:d3349b6b3c373ac1fbfb040b810fcee5e2adc7e0</id>
<content type='text'>
syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.

[1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Reported-by: syzbot &lt;syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Renamed sysfs_ready -&gt; sysfs_inited.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>loop: remove cmd-&gt;rq member</title>
<updated>2018-04-15T04:34:27Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-04-13T22:24:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1894e916546df0efec9890a5c9954f4ad281494c'/>
<id>urn:sha1:1894e916546df0efec9890a5c9954f4ad281494c</id>
<content type='text'>
We can always get at the request from the payload, no need to store
a pointer to it.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block/loop: make loop cgroup aware</title>
<updated>2017-09-26T13:41:22Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-09-25T19:07:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d4478e92d6186ce37947a36994de407c27446266'/>
<id>urn:sha1:d4478e92d6186ce37947a36994de407c27446266</id>
<content type='text'>
loop block device handles IO in a separate thread. The actual IO
dispatched isn't cloned from the IO loop device received, so the
dispatched IO loses the cgroup context.

I'm ignoring buffer IO case now, which is quite complicated.  Making the
loop thread aware cgroup context doesn't really help. The loop device
only writes to a single file. In current writeback cgroup
implementation, the file can only belong to one cgroup.

For direct IO case, we could workaround the issue in theory. For
example, say we assign cgroup1 5M/s BW for loop device and cgroup2
10M/s. We can create a special cgroup for loop thread and assign at
least 15M/s for the underlayer disk. In this way, we correctly throttle
the two cgroups. But this is tricky to setup.

This patch tries to address the issue. We record bio's css in loop
command. When loop thread is handling the command, we then use the API
provided in patch 1 to set the css for current task. The bio layer will
use the css for new IO (from patch 3).

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>loop: remove union of use_aio and ref in struct loop_cmd</title>
<updated>2017-09-25T14:56:05Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-09-20T21:24:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e5313c141b49c1b1af43d1ca81398185d66ad1a6'/>
<id>urn:sha1:e5313c141b49c1b1af43d1ca81398185d66ad1a6</id>
<content type='text'>
When the request is completed, lo_complete_rq() checks cmd-&gt;use_aio.
However, if this is in fact an aio request, cmd-&gt;use_aio will have
already been reused as cmd-&gt;ref by lo_rw_aio*. Fix it by not using a
union. On x86_64, there's a hole after the union anyways, so this
doesn't make struct loop_cmd any bigger.

Fixes: 92d773324b7e ("block/loop: fix use after free")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
