<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/genhd.h, branch v4.19.261</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.261</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.261'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-03-11T13:04:59Z</updated>
<entry>
<title>block: genhd: add 'groups' argument to device_add_disk</title>
<updated>2021-03-11T13:04:59Z</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@suse.de</email>
</author>
<published>2021-02-23T09:28:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1bf6a186c4524e62c4e46d2c9f28a70fb45f9e8b'/>
<id>urn:sha1:1bf6a186c4524e62c4e46d2c9f28a70fb45f9e8b</id>
<content type='text'>
commit fef912bf860e8e7e48a2bfb978a356bba743a8b7 upstream.

Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().

Signed-off-by: Martin Wilck &lt;martin.wilck@suse.com&gt;
Signed-off-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: nr_sects_write(): Disable preemption on seqcount write</title>
<updated>2020-06-25T13:33:07Z</updated>
<author>
<name>Ahmed S. Darwish</name>
<email>a.darwish@linutronix.de</email>
</author>
<published>2020-06-03T14:49:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e1906ca16db5d2b28dffd5785f058d22dbea1ab4'/>
<id>urn:sha1:e1906ca16db5d2b28dffd5785f058d22dbea1ab4</id>
<content type='text'>
[ Upstream commit 15b81ce5abdc4b502aa31dff2d415b79d2349d2f ]

For optimized block readers not holding a mutex, the "number of sectors"
64-bit value is protected from tearing on 32-bit architectures by a
sequence counter.

Disable preemption before entering that sequence counter's write side
critical section. Otherwise, the read side can preempt the write side
section and spin for the entire scheduler tick. If the reader belongs to
a real-time scheduling class, it can spin forever and the kernel will
livelock.

Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: fix use-after-free on gendisk</title>
<updated>2019-05-31T13:46:18Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-04-02T12:06:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad393793794efc3770a14727880738f7ee4d636b'/>
<id>urn:sha1:ad393793794efc3770a14727880738f7ee4d636b</id>
<content type='text'>
[ Upstream commit 2c88e3c7ec32d7a40cc7c9b4a487cf90e4671bdd ]

commit 2da78092dda "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev-&gt;devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.

However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:

Process1		Worker			Process2
md_free
						blkdev_open
del_gendisk
  add delete_partition_work_fn() to wq
  						__blkdev_get
						get_gendisk
put_disk
  disk_release
    kfree(disk)
    						find part from ext_devt_idr
						get_disk_and_module(disk)
    					  	cause use after free

    			delete_partition_work_fn
			put_device(part)
    		  	part_release
		    	remove part from ext_devt_idr

Before &lt;devt, hd_struct pointer&gt; is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().

We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.

Thanks to Jan Kara for providing the solution and more clear comments
for the code.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: use rcu_work instead of call_rcu to avoid sleep in softirq</title>
<updated>2019-01-22T20:40:35Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2018-11-28T08:42:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4cc66cc4f81fb8b1d6e83548fa79005dcc93ee2a'/>
<id>urn:sha1:4cc66cc4f81fb8b1d6e83548fa79005dcc93ee2a</id>
<content type='text'>
commit 94a2c3a32b62e868dc1e3d854326745a7f1b8c7a upstream.

We recently got a stack by syzkaller like this:

BUG: sleeping function called from invalid context at mm/slab.h:361
in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
INFO: lockdep is turned off.
CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
 0000000000000000 0000000000000001 0000000000000004 0000000000000000
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] __dump_stack lib/dump_stack.c:15 [inline]
 &lt;IRQ&gt;  [&lt;ffffffff81cb2194&gt;] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
 [&lt;ffffffff8129a981&gt;] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
 [&lt;ffffffff8129ac33&gt;] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
 [&lt;ffffffff81794c13&gt;] slab_pre_alloc_hook mm/slab.h:361 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc_node mm/slub.c:2610 [inline]
 [&lt;ffffffff81794c13&gt;] slab_alloc mm/slub.c:2692 [inline]
 [&lt;ffffffff81794c13&gt;] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
 [&lt;ffffffff81cbe9a7&gt;] kmalloc include/linux/slab.h:479 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kzalloc include/linux/slab.h:623 [inline]
 [&lt;ffffffff81cbe9a7&gt;] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
 [&lt;ffffffff81cbf84f&gt;] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
 [&lt;ffffffff81cbb5b9&gt;] kobject_cleanup lib/kobject.c:633 [inline]
 [&lt;ffffffff81cbb5b9&gt;] kobject_release+0x229/0x440 lib/kobject.c:675
 [&lt;ffffffff81cbb0a2&gt;] kref_sub include/linux/kref.h:73 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kref_put include/linux/kref.h:98 [inline]
 [&lt;ffffffff81cbb0a2&gt;] kobject_put+0x72/0xd0 lib/kobject.c:692
 [&lt;ffffffff8216f095&gt;] put_device+0x25/0x30 drivers/base/core.c:1237
 [&lt;ffffffff81c4cc34&gt;] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
 [&lt;ffffffff813c08bc&gt;] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
 [&lt;ffffffff813c08bc&gt;] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
 [&lt;ffffffff813c08bc&gt;] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
 [&lt;ffffffff813c08bc&gt;] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
 [&lt;ffffffff8120f509&gt;] __do_softirq+0x299/0xe20 kernel/softirq.c:273
 [&lt;ffffffff81210496&gt;] invoke_softirq kernel/softirq.c:350 [inline]
 [&lt;ffffffff81210496&gt;] irq_exit+0x216/0x2c0 kernel/softirq.c:391
 [&lt;ffffffff82c2cd7b&gt;] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
 [&lt;ffffffff82c2cd7b&gt;] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
 [&lt;ffffffff82c2bc25&gt;] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
 &lt;EOI&gt;  [&lt;ffffffff814cbf40&gt;] ? audit_kill_trees+0x180/0x180
 [&lt;ffffffff8187d2f7&gt;] fd_install+0x57/0x80 fs/file.c:626
 [&lt;ffffffff8180989e&gt;] do_sys_open+0x45e/0x550 fs/open.c:1043
 [&lt;ffffffff818099c2&gt;] SYSC_open fs/open.c:1055 [inline]
 [&lt;ffffffff818099c2&gt;] SyS_open+0x32/0x40 fs/open.c:1050
 [&lt;ffffffff82c299e1&gt;] entry_SYSCALL_64_fastpath+0x1e/0x9a

In softirq context, we call rcu callback function delete_partition_rcu_cb(),
which may allocate memory by kzalloc with GFP_KERNEL flag. If the
allocation cannot be satisfied, it may sleep. However, That is not allowed
in softirq contex.

Although we found this problem on linux 4.4, the latest kernel version
seems to have this problem as well. And it is very similar to the
previous one:
	https://lkml.org/lkml/2018/7/9/391

Fix it by using RCU workqueue, which allows sleep.

Reviewed-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: use nanosecond resolution for iostat</title>
<updated>2018-09-22T02:26:59Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2018-09-21T23:44:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b57e99b4b8b0ebdf9707424e7ddc0c392bdc5fe6'/>
<id>urn:sha1:b57e99b4b8b0ebdf9707424e7ddc0c392bdc5fe6</id>
<content type='text'>
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.

Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.

Fixes: 522a777566f5 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche &lt;klaus.kusche@computerix.info&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Track DISCARD statistics and output them in stat and diskstat</title>
<updated>2018-07-18T14:44:22Z</updated>
<author>
<name>Michael Callahan</name>
<email>michaelcallahan@fb.com</email>
</author>
<published>2018-07-18T11:47:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bdca3c87fb7ad1cc61d231d37eb0d8f90d001e0c'/>
<id>urn:sha1:bdca3c87fb7ad1cc61d231d37eb0d8f90d001e0c</id>
<content type='text'>
Add tracking of REQ_OP_DISCARD ios to the partition statistics and
append them to the various stat files in /sys as well as
/proc/diskstats.  These are tracked with the same four stats as reads
and writes:

Number of discard ios completed.
Number of discard ios merged
Number of discard sectors completed
Milliseconds spent on discard requests

This is done via adding a new STAT_DISCARD define to genhd.h and then
using it to index that stat field for discard requests.

tj: Refreshed on top of v4.17 and other previous updates.

Signed-off-by: Michael Callahan &lt;michaelcallahan@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andy Newell &lt;newella@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Define and use STAT_READ and STAT_WRITE</title>
<updated>2018-07-18T14:44:18Z</updated>
<author>
<name>Michael Callahan</name>
<email>michaelcallahan@fb.com</email>
</author>
<published>2018-07-18T11:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dbae2c551377b6533a00c11fc7ede370100ab404'/>
<id>urn:sha1:dbae2c551377b6533a00c11fc7ede370100ab404</id>
<content type='text'>
Add defines for STAT_READ and STAT_WRITE for indexing the partition
stat entries. This clarifies some fs/ code which has hardcoded 1 for
STAT_WRITE and will make it easier to extend the stats with additional
fields.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan &lt;michaelcallahan@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Add part_stat_read_accum to read across field entries.</title>
<updated>2018-07-18T14:44:16Z</updated>
<author>
<name>Michael Callahan</name>
<email>michaelcallahan@fb.com</email>
</author>
<published>2018-07-18T11:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=59767fbd49d794b4499d30b314df6c0d4aca584b'/>
<id>urn:sha1:59767fbd49d794b4499d30b314df6c0d4aca584b</id>
<content type='text'>
Add a part_stat_read_accum macro to genhd.h to read and sum across
field entries.  For example to sum up the number read and write
sectors completed.  In addition to being ar reasonable cleanup by
itself this will make it easier to add new stat fields in the future.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan &lt;michaelcallahan@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix sysfs inflight counter</title>
<updated>2018-04-26T15:02:01Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2018-04-26T07:21:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bf0ddaba65ddbb2715af97041da8e7a45b2d8628'/>
<id>urn:sha1:bf0ddaba65ddbb2715af97041da8e7a45b2d8628</id>
<content type='text'>
When the blk-mq inflight implementation was added, /proc/diskstats was
converted to use it, but /sys/block/$dev/inflight was not. Fix it by
adding another helper to count in-flight requests by data direction.

Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>genhd: Fix BUG in blkdev_open()</title>
<updated>2018-02-26T16:48:42Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2018-02-26T12:01:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=56c0908c855afbb2bdda17c15d2879949a091ad3'/>
<id>urn:sha1:56c0908c855afbb2bdda17c15d2879949a091ad3</id>
<content type='text'>
When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:

CPU0				CPU1			CPU2
							del_gendisk()
							  bdev_unhash_inode(part1);

blkdev_open(part1, O_EXCL)	blkdev_open(part1, O_EXCL)
  bdev = bd_acquire()		  bdev = bd_acquire()
  blkdev_get(bdev)
    bd_start_claiming(bdev)
      - finds old inode 'whole'
      bd_prepare_to_claim() -&gt; 0
							  bdev_unhash_inode(whole);
							&lt;device removed&gt;
							&lt;new device under same
							 number created&gt;
				  blkdev_get(bdev);
				    bd_start_claiming(bdev)
				      - finds new inode 'whole'
				      bd_prepare_to_claim()
					- this also succeeds as we have
					  different 'whole' here...
					- bad things happen now as we
					  have two exclusive openers of
					  the same bdev

The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.

We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).

Reported-and-analyzed-by: Hou Tao &lt;houtao1@huawei.com&gt;
Tested-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
