<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/fs.h, branch v6.6.72</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.72</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.72'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-10-04T14:29:48Z</updated>
<entry>
<title>fs: Create a generic is_dot_dotdot() utility</title>
<updated>2024-10-04T14:29:48Z</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2023-12-31T00:46:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ef83620438d7b08cd66269c2262fcc2007cd5454'/>
<id>urn:sha1:ef83620438d7b08cd66269c2262fcc2007cd5454</id>
<content type='text'>
commit 42c3732fa8073717dd7d924472f1c0bc5b452fdc upstream.

De-duplicate the same functionality in several places by hoisting
the is_dot_dotdot() utility function into linux/fs.h.

Suggested-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Acked-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: Don't evict inode under the inode lru traversing context</title>
<updated>2024-08-29T15:33:13Z</updated>
<author>
<name>Zhihao Cheng</name>
<email>chengzhihao1@huawei.com</email>
</author>
<published>2024-08-09T03:16:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9bda5f6012dd00372f3a06a82ed8971a4c57c32'/>
<id>urn:sha1:b9bda5f6012dd00372f3a06a82ed8971a4c57c32</id>
<content type='text'>
commit 2a0629834cd82f05d424bbc193374f9a43d1f87d upstream.

The inode reclaiming process(See function prune_icache_sb) collects all
reclaimable inodes and mark them with I_FREEING flag at first, at that
time, other processes will be stuck if they try getting these inodes
(See function find_inode_fast), then the reclaiming process destroy the
inodes by function dispose_list(). Some filesystems(eg. ext4 with
ea_inode feature, ubifs with xattr) may do inode lookup in the inode
evicting callback function, if the inode lookup is operated under the
inode lru traversing context, deadlock problems may happen.

Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
        if ea_inode feature is enabled, the lookup process will be stuck
	under the evicting context like this:

 1. File A has inode i_reg and an ea inode i_ea
 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru-&gt;i_ea
 3. Then, following three processes running like this:

    PA                              PB
 echo 2 &gt; /proc/sys/vm/drop_caches
  shrink_slab
   prune_dcache_sb
   // i_reg is added into lru, lru-&gt;i_ea-&gt;i_reg
   prune_icache_sb
    list_lru_walk_one
     inode_lru_isolate
      i_ea-&gt;i_state |= I_FREEING // set inode state
     inode_lru_isolate
      __iget(i_reg)
      spin_unlock(&amp;i_reg-&gt;i_lock)
      spin_unlock(lru_lock)
                                     rm file A
                                      i_reg-&gt;nlink = 0
      iput(i_reg) // i_reg-&gt;nlink is 0, do evict
       ext4_evict_inode
        ext4_xattr_delete_inode
         ext4_xattr_inode_dec_ref_all
          ext4_xattr_inode_iget
           ext4_iget(i_ea-&gt;i_ino)
            iget_locked
             find_inode_fast
              __wait_on_freeing_inode(i_ea) ----→ AA deadlock
    dispose_list // cannot be executed by prune_icache_sb
     wake_up_bit(&amp;i_ea-&gt;i_state)

Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
        deleting process holds BASEHD's wbuf-&gt;io_mutex while getting the
	xattr inode, which could race with inode reclaiming process(The
        reclaiming process could try locking BASEHD's wbuf-&gt;io_mutex in
	inode evicting function), then an ABBA deadlock problem would
	happen as following:

 1. File A has inode ia and a xattr(with inode ixa), regular file B has
    inode ib and a xattr.
 2. getfattr(A, xattr_buf) // ixa is added into lru // lru-&gt;ixa
 3. Then, following three processes running like this:

        PA                PB                        PC
                echo 2 &gt; /proc/sys/vm/drop_caches
                 shrink_slab
                  prune_dcache_sb
                  // ib and ia are added into lru, lru-&gt;ixa-&gt;ib-&gt;ia
                  prune_icache_sb
                   list_lru_walk_one
                    inode_lru_isolate
                     ixa-&gt;i_state |= I_FREEING // set inode state
                    inode_lru_isolate
                     __iget(ib)
                     spin_unlock(&amp;ib-&gt;i_lock)
                     spin_unlock(lru_lock)
                                                   rm file B
                                                    ib-&gt;nlink = 0
 rm file A
  iput(ia)
   ubifs_evict_inode(ia)
    ubifs_jnl_delete_inode(ia)
     ubifs_jnl_write_inode(ia)
      make_reservation(BASEHD) // Lock wbuf-&gt;io_mutex
      ubifs_iget(ixa-&gt;i_ino)
       iget_locked
        find_inode_fast
         __wait_on_freeing_inode(ixa)
          |          iput(ib) // ib-&gt;nlink is 0, do evict
          |           ubifs_evict_inode
          |            ubifs_jnl_delete_inode(ib)
          ↓             ubifs_jnl_write_inode
     ABBA deadlock ←-----make_reservation(BASEHD)
                   dispose_list // cannot be executed by prune_icache_sb
                    wake_up_bit(&amp;ixa-&gt;i_state)

Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
to pin the inode in memory while inode_lru_isolate() reclaims its pages
instead of using ordinary inode reference. This way inode deletion
cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
evict() is made to wait for I_LRU_ISOLATING to be cleared before
proceeding with inode cleanup.

Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022
Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files")
Cc: stable@vger.kernel.org
Signed-off-by: Zhihao Cheng &lt;chengzhihao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs: Annotate struct file_handle with __counted_by() and use struct_size()</title>
<updated>2024-08-19T04:04:28Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2024-03-26T01:34:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=107449cfb2176318600e4763c9d9d267b32b636a'/>
<id>urn:sha1:107449cfb2176318600e4763c9d9d267b32b636a</id>
<content type='text'>
[ Upstream commit 68d6f4f3fbd9b1baae53e7cf33fb3362b5a21494 ]

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

While there, use struct_size() helper, instead of the open-coded
version.

[brauner@kernel.org: contains a fix by Edward for an OOB access]
Reported-by: syzbot+4139435cb1b34cf759c2@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis &lt;eadavis@qq.com&gt;
Link: https://lore.kernel.org/r/tencent_A7845DD769577306D813742365E976E3A205@qq.com
Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Link: https://lore.kernel.org/r/ZgImCXTdGDTeBvSS@neat
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: Convert to bdev_open_by_dev()</title>
<updated>2024-08-19T04:04:25Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2023-09-27T09:34:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4365d0d660ac38ba3ba867402e028872fec3698e'/>
<id>urn:sha1:4365d0d660ac38ba3ba867402e028872fec3698e</id>
<content type='text'>
[ Upstream commit f4a48bc36cdfae7c603e8e3f2a51e2a283f3f365 ]

Convert mount code to use bdev_open_by_dev() and propagate the handle
around to bdev_release().

Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20230927093442.25915-19-jack@suse.cz
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Stable-dep-of: 6306ff39a7fc ("jfs: fix log-&gt;bdev_handle null ptr deref in lbmStartIO")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>tracefs: Use generic inode RCU for synchronizing freeing</title>
<updated>2024-08-14T11:58:56Z</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-08-07T22:54:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=726f4c241e17be75a9cf6870d80cd7479dc89e8f'/>
<id>urn:sha1:726f4c241e17be75a9cf6870d80cd7479dc89e8f</id>
<content type='text'>
commit 0b6743bd60a56a701070b89fb80c327a44b7b3e2 upstream.

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[&lt;...&gt;] list_del corruption, ffff888103ee2cb0-&gt;next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[&lt;...&gt;] ------------[ cut here ]------------
[&lt;...&gt;] kernel BUG at lib/list_debug.c:54!
[&lt;...&gt;] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[&lt;...&gt;] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[&lt;...&gt;] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[&lt;...&gt;] RIP: 0010:[&lt;ffffffff84656018&gt;] __list_del_entry_valid_or_report+0x138/0x3e0
[&lt;...&gt;] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff &lt;0f&gt; 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[&lt;...&gt;] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[&lt;...&gt;] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[&lt;...&gt;] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[&lt;...&gt;] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[&lt;...&gt;] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[&lt;...&gt;] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[&lt;...&gt;] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[&lt;...&gt;] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[&lt;...&gt;] RSI: __func__.47+0x4340/0x4400
[&lt;...&gt;] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[&lt;...&gt;] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[&lt;...&gt;] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[&lt;...&gt;] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[&lt;...&gt;] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[&lt;...&gt;] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[&lt;...&gt;] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[&lt;...&gt;] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[&lt;...&gt;] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[&lt;...&gt;] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[&lt;...&gt;] ASID: 0003
[&lt;...&gt;] Stack:
[&lt;...&gt;]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[&lt;...&gt;]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[&lt;...&gt;]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[&lt;...&gt;] Call Trace:
[&lt;...&gt;]  &lt;TASK&gt;
[&lt;...&gt;]  [&lt;ffffffff818a2315&gt;] ? lock_release+0x175/0x380 fffffe80416afaf0
[&lt;...&gt;]  [&lt;ffffffff8248b392&gt;] list_lru_del+0x152/0x740 fffffe80416afb48
[&lt;...&gt;]  [&lt;ffffffff8248ba93&gt;] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[&lt;...&gt;]  [&lt;ffffffff8940fd19&gt;] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[&lt;...&gt;]  [&lt;ffffffff8295b244&gt;] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[&lt;...&gt;]  [&lt;ffffffff8293a52b&gt;] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[&lt;...&gt;]  [&lt;ffffffff8293fefc&gt;] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[&lt;...&gt;]  [&lt;ffffffff8953a85f&gt;] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[&lt;...&gt;]  [&lt;ffffffff82949ce5&gt;] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[&lt;...&gt;]  [&lt;ffffffff82949b71&gt;] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[&lt;...&gt;]  [&lt;ffffffff82949da8&gt;] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[&lt;...&gt;]  [&lt;ffffffff8294ae75&gt;] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[&lt;...&gt;]  [&lt;ffffffff8953a7c3&gt;] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[&lt;...&gt;]  [&lt;ffffffff8294ad20&gt;] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[&lt;...&gt;]  [&lt;ffffffff82997349&gt;] ? do_remount+0x329/0xa00 fffffe80416afd18
[&lt;...&gt;]  [&lt;ffffffff83ebf7a1&gt;] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[&lt;...&gt;]  [&lt;ffffffff82892096&gt;] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[&lt;...&gt;]  [&lt;ffffffff815d1327&gt;] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[&lt;...&gt;]  [&lt;ffffffff82997436&gt;] do_remount+0x416/0xa00 fffffe80416afdd0
[&lt;...&gt;]  [&lt;ffffffff829b2ba4&gt;] path_mount+0x5c4/0x900 fffffe80416afe28
[&lt;...&gt;]  [&lt;ffffffff829b25e0&gt;] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[&lt;...&gt;]  [&lt;ffffffff82903812&gt;] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[&lt;...&gt;]  [&lt;ffffffff829b2ff5&gt;] do_mount+0x115/0x1c0 fffffe80416afeb8
[&lt;...&gt;]  [&lt;ffffffff829b2ee0&gt;] ? path_mount+0x900/0x900 fffffe80416afed8
[&lt;...&gt;]  [&lt;ffffffff8272461c&gt;] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[&lt;...&gt;]  [&lt;ffffffff829b31cf&gt;] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[&lt;...&gt;]  [&lt;ffffffff829b36cd&gt;] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[&lt;...&gt;]  [&lt;ffffffff819f8818&gt;] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[&lt;...&gt;]  [&lt;ffffffff8111655e&gt;] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[&lt;...&gt;]  [&lt;ffffffff8952756d&gt;] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[&lt;...&gt;]  [&lt;ffffffff8100119b&gt;] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[&lt;...&gt;]  &lt;/TASK&gt;
[&lt;...&gt;]  &lt;PTREGS&gt;
[&lt;...&gt;] RIP: 0033:[&lt;00006dcb382ff66a&gt;] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[&lt;...&gt;] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[&lt;...&gt;] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[&lt;...&gt;] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[&lt;...&gt;] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[&lt;...&gt;] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[&lt;...&gt;] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[&lt;...&gt;] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[&lt;...&gt;] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[&lt;...&gt;]  &lt;/PTREGS&gt;
[&lt;...&gt;] Modules linked in:
[&lt;...&gt;] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '-&gt;next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
   	umode_t			i_mode;
   	unsigned short		i_opflags;
   	kuid_t			i_uid;
  +	struct list_head	i_lru;		/* inode LRU list */
   	kgid_t			i_gid;
   	unsigned int		i_flags;

  @@ -690,7 +691,6 @@ struct inode {
   	u16			i_wb_frn_avg_time;
   	u16			i_wb_frn_history;
   #endif
  -	struct list_head	i_lru;		/* inode LRU list */
   	struct list_head	i_sb_list;
   	struct list_head	i_wb_list;	/* backing dev writeback list */
   	union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/20240807115143.45927-3-minipli@grsecurity.net/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Ajay Kaher &lt;ajay.kaher@broadcom.com&gt;
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= &lt;digirigawa@gmail.com&gt;
Link: https://lore.kernel.org/20240807185402.61410544@gandalf.local.home
Fixes: baa23a8d4360 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause &lt;minipli@grsecurity.net&gt;
Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>quota: Properly annotate i_dquot arrays with __rcu</title>
<updated>2024-03-26T22:19:46Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2024-02-06T14:08:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42954c374534f37dd25a4096b52d28e46dc1f8ba'/>
<id>urn:sha1:42954c374534f37dd25a4096b52d28e46dc1f8ba</id>
<content type='text'>
[ Upstream commit ccb49011bb2ebfd66164dbf68c5bff48917bb5ef ]

Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.

Fixes: b9ba6f94b238 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ovl: Always reject mounting over case-insensitive directories</title>
<updated>2024-03-26T22:19:18Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-02-21T17:14:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c6f95031cf21701db5d2b449bf77b73e98edabde'/>
<id>urn:sha1:c6f95031cf21701db5d2b449bf77b73e98edabde</id>
<content type='text'>
[ Upstream commit 2824083db76cb9d4b7910607b367e93b02912865 ]

overlayfs relies on the filesystem setting DCACHE_OP_HASH or
DCACHE_OP_COMPARE to reject mounting over case-insensitive directories.

Since commit bb9cd9106b22 ("fscrypt: Have filesystems handle their
d_ops"), we set -&gt;d_op through a hook in -&gt;d_lookup, which
means the root dentry won't have them, causing the mount to accidentally
succeed.

In v6.7-rc7, the following sequence will succeed to mount, but any
dentry other than the root dentry will be a "weird" dentry to ovl and
fail with EREMOTE.

  mkfs.ext4 -O casefold lower.img
  mount -O loop lower.img lower
  mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl /mnt

Mounting on a subdirectory fails, as expected, because DCACHE_OP_HASH
and DCACHE_OP_COMPARE are properly set by -&gt;lookup.

Fix by explicitly rejecting superblocks that allow case-insensitive
dentries. Yes, this will be solved when we move d_op configuration back
to -&gt;s_d_op. Yet, we better have an explicit fix to avoid messing up
again.

While there, re-sort the entries to have more descriptive error messages
first.

Fixes: bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops")
Acked-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20240221171412.10710-2-krisman@suse.de
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio</title>
<updated>2024-03-01T12:34:59Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2024-02-15T20:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e7e23fc5d5fe422827c9a43ecb579448f73876c7'/>
<id>urn:sha1:e7e23fc5d5fe422827c9a43ecb579448f73876c7</id>
<content type='text'>
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream.

If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the
following kernel warning appears:

WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8
Call trace:
 kiocb_set_cancel_fn+0x9c/0xa8
 ffs_epfile_read_iter+0x144/0x1d0
 io_read+0x19c/0x498
 io_issue_sqe+0x118/0x27c
 io_submit_sqes+0x25c/0x5fc
 __arm64_sys_io_uring_enter+0x104/0xab0
 invoke_syscall+0x58/0x11c
 el0_svc_common+0xb4/0xf4
 do_el0_svc+0x2c/0xb0
 el0_svc+0x2c/0xa4
 el0t_64_sync_handler+0x68/0xb4
 el0t_64_sync+0x1a4/0x1a8

Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is
submitted by libaio.

Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Avi Kivity &lt;avi@scylladb.com&gt;
Cc: Sandeep Dhavale &lt;dhavale@google.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs: new accessor methods for atime and mtime</title>
<updated>2024-01-05T14:19:40Z</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2023-10-04T18:52:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b5599a7eee5e6101c6ae738682331bc6299a54d'/>
<id>urn:sha1:5b5599a7eee5e6101c6ae738682331bc6299a54d</id>
<content type='text'>
[ Upstream commit 077c212f0344ae4198b2b51af128a94b614ccdf4 ]

Recently, we converted the ctime accesses in the kernel to use new
accessor functions. Linus recently pointed out though that if we add
accessors for the atime and mtime, then that would allow us to
seamlessly change how these timestamps are stored in the inode.

Add new accessor functions for the atime and mtime that mirror the
accessors for the ctime.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://lore.kernel.org/r/20231004185239.80830-1-jlayton@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Stable-dep-of: 01fe654f78fd ("fs: cifs: Fix atime update check")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>audit,io_uring: io_uring openat triggers audit reference count underflow</title>
<updated>2023-10-13T16:34:46Z</updated>
<author>
<name>Dan Clash</name>
<email>daclash@linux.microsoft.com</email>
</author>
<published>2023-10-12T21:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=03adc61edad49e1bbecfb53f7ea5d78f398fe368'/>
<id>urn:sha1:03adc61edad49e1bbecfb53f7ea5d78f398fe368</id>
<content type='text'>
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.

A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.

These first part of the system call performs two increments that do not race.

do_syscall_64()
  __do_sys_io_uring_enter()
    io_submit_sqes()
      io_openat_prep()
        __io_openat_prep()
          getname()
            getname_flags()       /* update 1 (increment) */
              __audit_getname()   /* update 2 (increment) */

The openat op is queued to an io_uring worker thread which starts the
opportunity for a race.  The system call exit performs one decrement.

do_syscall_64()
  syscall_exit_to_user_mode()
    syscall_exit_to_user_mode_prepare()
      __audit_syscall_exit()
        audit_reset_context()
           putname()              /* update 3 (decrement) */

The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.

io_wqe_worker()
  io_worker_handle_work()
    io_wq_submit_work()
      io_issue_sqe()
        io_openat()
          io_openat2()
            do_filp_open()
              path_openat()
                __audit_inode()   /* update 4 (increment) */
            putname()             /* update 5 (decrement) */
        __audit_uring_exit()
          audit_reset_context()
            putname()             /* update 6 (decrement) */

The fix is to change the refcnt member of struct audit_names
from int to atomic_t.

kernel BUG at fs/namei.c:262!
Call Trace:
...
 ? putname+0x68/0x70
 audit_reset_context.part.0.constprop.0+0xe1/0x300
 __audit_uring_exit+0xda/0x1c0
 io_issue_sqe+0x1f3/0x450
 ? lock_timer_base+0x3b/0xd0
 io_wq_submit_work+0x8d/0x2b0
 ? __try_to_del_timer_sync+0x67/0xa0
 io_worker_handle_work+0x17c/0x2b0
 io_wqe_worker+0x10a/0x350

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash &lt;daclash@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
