<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs, branch v3.18.28</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.28</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.28'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-03-04T15:19:42Z</updated>
<entry>
<title>fs-writeback: unplug before cond_resched in writeback_sb_inodes</title>
<updated>2016-03-04T15:19:42Z</updated>
<author>
<name>Chris Mason</name>
<email>clm@fb.com</email>
</author>
<published>2015-09-18T17:35:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fb0c4b7ca3edc0e2404174a01e1af52c23d7401'/>
<id>urn:sha1:3fb0c4b7ca3edc0e2404174a01e1af52c23d7401</id>
<content type='text'>
[ Upstream commit 590dca3a71875461e8fea3013af74386945191b2 ]

Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
which increases the merge rate when relatively contiguous small files
are written by the filesystem.  It helps both on flash and spindles.

For an fs_mark workload creating 4K files in parallel across 8 drives,
this commit improves performance ~9% more by unplugging before calling
cond_resched().  cond_resched() doesn't trigger an implicit unplug, so
explicitly getting the IO down to the device before scheduling reduces
latencies for anyone waiting on clean pages.

It also cuts down on how often we use kblockd to unplug, which means
less work bouncing from one workqueue to another.

Many more details about how we got here:

  https://lkml.org/lkml/2015/9/11/570

Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>ext4: fix crashes in dioread_nolock mode</title>
<updated>2016-03-04T15:18:43Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-02-19T05:33:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2bc4fc8104b15fc237204165d5c35f9bb069680a'/>
<id>urn:sha1:2bc4fc8104b15fc237204165d5c35f9bb069680a</id>
<content type='text'>
[ Upstream commit 74dae4278546b897eb81784fdfcce872ddd8b2b8 ]

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>ext4: don't read blocks from disk after extents being swapped</title>
<updated>2016-03-04T15:18:41Z</updated>
<author>
<name>Eryu Guan</name>
<email>guaneryu@gmail.com</email>
</author>
<published>2016-02-12T06:20:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b42e89beb60b3b618b180349a0e3b02ae3d4657e'/>
<id>urn:sha1:b42e89beb60b3b618b180349a0e3b02ae3d4657e</id>
<content type='text'>
[ Upstream commit bcff24887d00bce102e0857d7b0a8c44a40f53d1 ]

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 &gt; /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating &amp; mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Cc: stable@vger.kernel.org
Signed-off-by: Eryu Guan &lt;guaneryu@gmail.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>ext4: move_extent improve bh vanishing success factor</title>
<updated>2016-03-04T15:18:41Z</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2014-11-05T16:52:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c0346fd76b0557bab2f80d8f1220434821c74c35'/>
<id>urn:sha1:c0346fd76b0557bab2f80d8f1220434821c74c35</id>
<content type='text'>
[ Upstream commit 88c6b61ff1cfb4013a3523227d91ad11b2892388 ]

Xiaoguang Wang has reported sporadic EBUSY failures of ext4/302
Unfortunetly there is nothing we can do if some other task holds BH's
refenrence.  So we must return EBUSY in this case.  But we can try
kicking the journal to see if the other task releases the bh reference
after the commit is complete.  Also decrease false positives by
properly checking for ENOSPC and retrying the allocation after kicking
the journal --- which is done by ext4_should_retry_alloc().

[ Modified by tytso to properly check for ENOSPC. ]

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>ext4: fix potential integer overflow</title>
<updated>2016-03-04T15:18:41Z</updated>
<author>
<name>Insu Yun</name>
<email>wuninsu@gmail.com</email>
</author>
<published>2016-02-12T06:15:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a1bd518443ff68b7c75f534000164d23426f49a6'/>
<id>urn:sha1:a1bd518443ff68b7c75f534000164d23426f49a6</id>
<content type='text'>
[ Upstream commit 46901760b46064964b41015d00c140c83aa05bcf ]

Since sizeof(ext_new_group_data) &gt; sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Cc: stable@vger.kernel.org
Signed-off-by: Insu Yun &lt;wuninsu@gmail.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>btrfs: properly set the termination value of ctx-&gt;pos in readdir</title>
<updated>2016-03-04T15:18:40Z</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2015-11-13T12:44:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=66620e96ab6fd3ed93eb260b39141c925cb30484'/>
<id>urn:sha1:66620e96ab6fd3ed93eb260b39141c925cb30484</id>
<content type='text'>
[ Upstream commit bc4ef7592f657ae81b017207a1098817126ad4cb ]

The value of ctx-&gt;pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.

There's a report from PaX SIZE_OVERFLOW plugin that "ctx-&gt;pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&amp;t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.

We can get to that situation like that:

* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
  bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX

Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.

The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:

 Overflow: e
 Overflow: 7fffffff
 Overflow: 7fffffffffffffff
 PAX: size overflow detected in function btrfs_real_readdir
   fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
   context: dir_context;
 CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
  ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
  ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
  ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
 Call Trace:
  [&lt;ffffffff81742f0f&gt;] dump_stack+0x4c/0x7f
  [&lt;ffffffff811cb706&gt;] report_size_overflow+0x36/0x40
  [&lt;ffffffff812ef0bc&gt;] btrfs_real_readdir+0x69c/0x6d0
  [&lt;ffffffff811dafc8&gt;] iterate_dir+0xa8/0x150
  [&lt;ffffffff811e6d8d&gt;] ? __fget_light+0x2d/0x70
  [&lt;ffffffff811dba3a&gt;] SyS_getdents+0xba/0x1c0
 Overflow: 1a
  [&lt;ffffffff811db070&gt;] ? iterate_dir+0x150/0x150
  [&lt;ffffffff81749b69&gt;] entry_SYSCALL_64_fastpath+0x12/0x83

The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.

The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.

References: https://forums.grsecurity.net/viewtopic.php?f=1&amp;t=4284
Reported-by: Victor &lt;services@swwu.com&gt;
CC: stable@vger.kernel.org
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Tested-by: Holger Hoffstätte &lt;holger.hoffstaette@googlemail.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>cifs: fix erroneous return value</title>
<updated>2016-03-04T15:18:40Z</updated>
<author>
<name>Anton Protopopov</name>
<email>a.s.protopopov@gmail.com</email>
</author>
<published>2016-02-10T17:50:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c7c5f3e2aa8f3be24c64e5d50a319e2db951e9a'/>
<id>urn:sha1:0c7c5f3e2aa8f3be24c64e5d50a319e2db951e9a</id>
<content type='text'>
[ Upstream commit 4b550af519854421dfec9f7732cdddeb057134b2 ]

The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.

Signed-off-by: Anton Protopopov &lt;a.s.protopopov@gmail.com&gt;
CC: Stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Steve French &lt;smfrench@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>pty: make sure super_block is still valid in final /dev/tty close</title>
<updated>2016-03-02T20:19:16Z</updated>
<author>
<name>Herton R. Krzesinski</name>
<email>herton@redhat.com</email>
</author>
<published>2016-01-14T19:56:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8965b383a051cf8055bf38afc01ab59f787b9851'/>
<id>urn:sha1:8965b383a051cf8055bf38afc01ab59f787b9851</id>
<content type='text'>
[ Upstream commit 1f55c718c290616889c04946864a13ef30f64929 ]

Considering current pty code and multiple devpts instances, it's possible
to umount a devpts file system while a program still has /dev/tty opened
pointing to a previosuly closed pty pair in that instance. In the case all
ptmx and pts/N files are closed, umount can be done. If the program closes
/dev/tty after umount is done, devpts_kill_index will use now an invalid
super_block, which was already destroyed in the umount operation after
running -&gt;kill_sb. This is another "use after free" type of issue, but now
related to the allocated super_block instance.

To avoid the problem (warning at ida_remove and potential crashes) for
this specific case, I added two functions in devpts which grabs additional
references to the super_block, which pty code now uses so it makes sure
the super block structure is still valid until pty shutdown is done.
I also moved the additional inode references to the same functions, which
also covered similar case with inode being freed before /dev/tty final
close/shutdown.

Signed-off-by: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Cc: stable@vger.kernel.org # 2.6.29+
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl</title>
<updated>2016-03-02T20:18:57Z</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2016-02-03T19:17:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fb87262aec0685ef7fe89ae50a5ee900d48db4d4'/>
<id>urn:sha1:fb87262aec0685ef7fe89ae50a5ee900d48db4d4</id>
<content type='text'>
[ Upstream commit 0c0fe3b0fa45082cd752553fdb3a4b42503a118e ]

While doing some tests I ran into an hang on an extent buffer's rwlock
that produced the following trace:

[39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
[39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
[39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800016] irq event stamp: 0
[39389.800016] hardirqs last  enabled at (0): [&lt;          (null)&gt;]           (null)
[39389.800016] hardirqs last disabled at (0): [&lt;ffffffff8104e58d&gt;] copy_process+0x638/0x1a35
[39389.800016] softirqs last  enabled at (0): [&lt;ffffffff8104e58d&gt;] copy_process+0x638/0x1a35
[39389.800016] softirqs last disabled at (0): [&lt;          (null)&gt;]           (null)
[39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
[39389.800016] RIP: 0010:[&lt;ffffffff810902af&gt;]  [&lt;ffffffff810902af&gt;] queued_spin_lock_slowpath+0x57/0x158
[39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
[39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
[39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
[39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
[39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
[39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
[39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
[39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800016] Stack:
[39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
[39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
[39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
[39389.800016] Call Trace:
[39389.800016]  [&lt;ffffffff81091e11&gt;] queued_read_lock_slowpath+0x46/0x60
[39389.800016]  [&lt;ffffffff81091895&gt;] do_raw_read_lock+0x3e/0x41
[39389.800016]  [&lt;ffffffff81486c5c&gt;] _raw_read_lock+0x3d/0x44
[39389.800016]  [&lt;ffffffffa067288c&gt;] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [&lt;ffffffffa067288c&gt;] btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [&lt;ffffffffa0622ced&gt;] ? btrfs_find_item+0xa7/0xd2 [btrfs]
[39389.800016]  [&lt;ffffffffa069363f&gt;] btrfs_ref_to_path+0xd6/0x174 [btrfs]
[39389.800016]  [&lt;ffffffffa0693730&gt;] inode_to_path+0x53/0xa2 [btrfs]
[39389.800016]  [&lt;ffffffffa0693e2e&gt;] paths_from_inode+0x117/0x2ec [btrfs]
[39389.800016]  [&lt;ffffffffa0670cff&gt;] btrfs_ioctl+0xd5b/0x2793 [btrfs]
[39389.800016]  [&lt;ffffffff8108a8b0&gt;] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [&lt;ffffffff81276727&gt;] ? __this_cpu_preempt_check+0x13/0x15
[39389.800016]  [&lt;ffffffff8108a8b0&gt;] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [&lt;ffffffff8118b3d4&gt;] ? rcu_read_unlock+0x3e/0x5d
[39389.800016]  [&lt;ffffffff811822f8&gt;] do_vfs_ioctl+0x42b/0x4ea
[39389.800016]  [&lt;ffffffff8118b4f3&gt;] ? __fget_light+0x62/0x71
[39389.800016]  [&lt;ffffffff8118240e&gt;] SyS_ioctl+0x57/0x79
[39389.800016]  [&lt;ffffffff814872d7&gt;] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 &lt;8b&gt; 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
[39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800012] irq event stamp: 0
[39389.800012] hardirqs last  enabled at (0): [&lt;          (null)&gt;]           (null)
[39389.800012] hardirqs last disabled at (0): [&lt;ffffffff8104e58d&gt;] copy_process+0x638/0x1a35
[39389.800012] softirqs last  enabled at (0): [&lt;ffffffff8104e58d&gt;] copy_process+0x638/0x1a35
[39389.800012] softirqs last disabled at (0): [&lt;          (null)&gt;]           (null)
[39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
[39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
[39389.800012] RIP: 0010:[&lt;ffffffff81091e8d&gt;]  [&lt;ffffffff81091e8d&gt;] queued_write_lock_slowpath+0x62/0x72
[39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
[39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
[39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
[39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
[39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
[39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
[39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
[39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800012] Stack:
[39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
[39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
[39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
[39389.800012] Call Trace:
[39389.800012]  [&lt;ffffffff81091963&gt;] do_raw_write_lock+0x72/0x8c
[39389.800012]  [&lt;ffffffff81486f1b&gt;] _raw_write_lock+0x3a/0x41
[39389.800012]  [&lt;ffffffffa0672cb3&gt;] ? btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [&lt;ffffffffa0672cb3&gt;] btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [&lt;ffffffffa061aeba&gt;] ? rcu_read_unlock+0x5b/0x5d [btrfs]
[39389.800012]  [&lt;ffffffffa061ce13&gt;] ? btrfs_root_node+0xda/0xe6 [btrfs]
[39389.800012]  [&lt;ffffffffa061ce83&gt;] btrfs_lock_root_node+0x22/0x42 [btrfs]
[39389.800012]  [&lt;ffffffffa062046b&gt;] btrfs_search_slot+0x1b8/0x758 [btrfs]
[39389.800012]  [&lt;ffffffff810fc6b0&gt;] ? time_hardirqs_on+0x15/0x28
[39389.800012]  [&lt;ffffffffa06365db&gt;] btrfs_lookup_inode+0x31/0x95 [btrfs]
[39389.800012]  [&lt;ffffffff8108d62f&gt;] ? trace_hardirqs_on+0xd/0xf
[39389.800012]  [&lt;ffffffff8148482b&gt;] ? mutex_lock_nested+0x397/0x3bc
[39389.800012]  [&lt;ffffffffa068821b&gt;] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
[39389.800012]  [&lt;ffffffffa068858e&gt;] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
[39389.800012]  [&lt;ffffffff81486ab7&gt;] ? _raw_spin_unlock+0x31/0x44
[39389.800012]  [&lt;ffffffffa0688a48&gt;] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
[39389.800012]  [&lt;ffffffffa0688d62&gt;] btrfs_run_delayed_items+0x11/0x13 [btrfs]
[39389.800012]  [&lt;ffffffffa064048e&gt;] btrfs_commit_transaction+0x234/0x96e [btrfs]
[39389.800012]  [&lt;ffffffffa0618d10&gt;] btrfs_sync_fs+0x145/0x1ad [btrfs]
[39389.800012]  [&lt;ffffffffa0671176&gt;] btrfs_ioctl+0x11d2/0x2793 [btrfs]
[39389.800012]  [&lt;ffffffff8108a8b0&gt;] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [&lt;ffffffff81140261&gt;] ? __might_fault+0x4c/0xa7
[39389.800012]  [&lt;ffffffff81140261&gt;] ? __might_fault+0x4c/0xa7
[39389.800012]  [&lt;ffffffff8108a8b0&gt;] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [&lt;ffffffff8118b3d4&gt;] ? rcu_read_unlock+0x3e/0x5d
[39389.800012]  [&lt;ffffffff811822f8&gt;] do_vfs_ioctl+0x42b/0x4ea
[39389.800012]  [&lt;ffffffff8118b4f3&gt;] ? __fget_light+0x62/0x71
[39389.800012]  [&lt;ffffffff8118240e&gt;] SyS_ioctl+0x57/0x79
[39389.800012]  [&lt;ffffffff814872d7&gt;] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 &lt;8b&gt; 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00

This happens because in the code path executed by the inode_paths ioctl we
end up nesting two calls to read lock a leaf's rwlock when after the first
call to read_lock() and before the second call to read_lock(), another
task (running the delayed items as part of a transaction commit) has
already called write_lock() against the leaf's rwlock. This situation is
illustrated by the following diagram:

         Task A                       Task B

  btrfs_ref_to_path()               btrfs_commit_transaction()
    read_lock(&amp;eb-&gt;lock);

                                      btrfs_run_delayed_items()
                                        __btrfs_commit_inode_delayed_items()
                                          __btrfs_update_delayed_inode()
                                            btrfs_lookup_inode()

                                              write_lock(&amp;eb-&gt;lock);
                                                --&gt; task waits for lock

    read_lock(&amp;eb-&gt;lock);
    --&gt; makes this task hang
        forever (and task B too
	of course)

So fix this by avoiding doing the nested read lock, which is easily
avoidable. This issue does not happen if task B calls write_lock() after
task A does the second call to read_lock(), however there does not seem
to exist anything in the documentation that mentions what is the expected
behaviour for recursive locking of rwlocks (leaving the idea that doing
so is not a good usage of rwlocks).

Also, as a side effect necessary for this fix, make sure we do not
needlessly read lock extent buffers when the input path has skip_locking
set (used when called from send).

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup</title>
<updated>2016-02-15T20:42:41Z</updated>
<author>
<name>xuejiufei</name>
<email>xuejiufei@huawei.com</email>
</author>
<published>2016-02-05T23:36:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed496f4fa64ec3e97725748fb3916128a9299d4c'/>
<id>urn:sha1:ed496f4fa64ec3e97725748fb3916128a9299d4c</id>
<content type='text'>
[ Upstream commit c95a51807b730e4681e2ecbdfd669ca52601959e ]

When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.

Signed-off-by: xuejiufei &lt;xuejiufei@huawei.com&gt;
Reviewed-by: Joseph Qi &lt;joseph.qi@huawei.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
</feed>
