<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/inode.c, branch v3.1.1</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.1.1</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.1.1'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-08-25T17:50:18Z</updated>
<entry>
<title>lockdep: Add helper function for dir vs file i_mutex annotation</title>
<updated>2011-08-25T17:50:18Z</updated>
<author>
<name>Josh Boyer</name>
<email>jwboyer@redhat.com</email>
</author>
<published>2011-08-25T11:48:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e096d0c7e2e4e5893792db865dd065ac73cf1f00'/>
<id>urn:sha1:e096d0c7e2e4e5893792db865dd065ac73cf1f00</id>
<content type='text'>
Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm-&gt;mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm-&gt;mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&amp;mm-&gt;mmap_sem){++++++}, at: [&lt;ffffffff81109514&gt;] might_fault+0x5c/0xac

    but task is already holding lock:
     (&amp;sb-&gt;s_type-&gt;i_mutex_key#15){+.+.+.}, at: [&lt;ffffffff81149f34&gt;]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -&gt; #1 (&amp;sb-&gt;s_type-&gt;i_mutex_key#15){+.+.+.}:
          [&lt;ffffffff8108ac26&gt;] lock_acquire+0xbf/0x103
          [&lt;ffffffff814db822&gt;] __mutex_lock_common+0x4c/0x361
          [&lt;ffffffff814dbc46&gt;] mutex_lock_nested+0x40/0x45
          [&lt;ffffffff811daa87&gt;] hugetlbfs_file_mmap+0x82/0x110
          [&lt;ffffffff81111557&gt;] mmap_region+0x258/0x432
          [&lt;ffffffff811119dd&gt;] do_mmap_pgoff+0x2ac/0x306
          [&lt;ffffffff81111b4f&gt;] sys_mmap_pgoff+0x118/0x16a
          [&lt;ffffffff8100c858&gt;] sys_mmap+0x22/0x24
          [&lt;ffffffff814e3ec2&gt;] system_call_fastpath+0x16/0x1b

    -&gt; #0 (&amp;mm-&gt;mmap_sem){++++++}:
          [&lt;ffffffff8108a4bc&gt;] __lock_acquire+0xa1a/0xcf7
          [&lt;ffffffff8108ac26&gt;] lock_acquire+0xbf/0x103
          [&lt;ffffffff81109541&gt;] might_fault+0x89/0xac
          [&lt;ffffffff81149cff&gt;] filldir+0x6f/0xc7
          [&lt;ffffffff811586ea&gt;] dcache_readdir+0x67/0x205
          [&lt;ffffffff81149f54&gt;] vfs_readdir+0x7b/0xb4
          [&lt;ffffffff8114a073&gt;] sys_getdents+0x7e/0xd1
          [&lt;ffffffff814e3ec2&gt;] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer &lt;jwboyer@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: optimize inode cache access patterns</title>
<updated>2011-08-07T05:53:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-08-07T05:45:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ddcd0569cd68f00f3beae9a7959b72918bb91f4'/>
<id>urn:sha1:3ddcd0569cd68f00f3beae9a7959b72918bb91f4</id>
<content type='text'>
The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode-&gt;i_op-&gt;xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry-&gt;d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vfs: avoid call to inode_lru_list_del() if possible</title>
<updated>2011-08-01T05:41:17Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-28T04:55:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4ae0c65455c1bb30d1b71c6dd9a1a62aadde8ef'/>
<id>urn:sha1:c4ae0c65455c1bb30d1b71c6dd9a1a62aadde8ef</id>
<content type='text'>
inode_lru_list_del() is expensive because of per superblock lru locking,
while some inodes are not in lru list.

Adding a check in iput_final() can speedup pipe/sockets workloads on
SMP.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>vfs: avoid taking inode_hash_lock on pipes and sockets</title>
<updated>2011-08-01T05:41:17Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-28T04:41:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f2ee7abf4c40c8e6bffced923a7c01ea2d1f6c97'/>
<id>urn:sha1:f2ee7abf4c40c8e6bffced923a7c01ea2d1f6c97</id>
<content type='text'>
Some inodes (pipes, sockets, ...) are not hashed, no need to take
contended inode_hash_lock at dismantle time.

nice speedup on SMP machines on socket intensive workloads.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>vfs: conditionally call inode_wb_list_del()</title>
<updated>2011-08-01T05:41:17Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-28T04:11:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b12362bdb61a230a67daa77bcd2a11e59b2802e1'/>
<id>urn:sha1:b12362bdb61a230a67daa77bcd2a11e59b2802e1</id>
<content type='text'>
Some inodes (pipes, sockets, ...) are not in bdi writeback list.

evict() can avoid calling inode_wb_list_del() and its expensive spinlock
by checking inode i_wb_list being empty or not.

At this point, no other cpu/user can concurrently manipulate this inode
i_wb_list

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6</title>
<updated>2011-07-27T01:30:20Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-07-27T01:30:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e371d46ae45488bcb112a99a7de462e9e3aa6764'/>
<id>urn:sha1:e371d46ae45488bcb112a99a7de462e9e3aa6764</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  merge fchmod() and fchmodat() guts, kill ancient broken kludge
  xfs: fix misspelled S_IS...()
  xfs: get rid of open-coded S_ISREG(), etc.
  vfs: document locking requirements for d_move, __d_move and d_materialise_unique
  omfs: fix (mode &amp; S_IFDIR) abuse
  btrfs: S_ISREG(mode) is not mode &amp; S_IFREG...
  ima: fmode_t misspelled as mode_t...
  pci-label.c: size_t misspelled as mode_t
  jffs2: S_ISLNK(mode &amp; S_IFMT) is pointless
  snd_msnd -&gt;mode is fmode_t, not mode_t
  v9fs_iop_get_acl: get rid of unused variable
  vfs: dont chain pipe/anon/socket on superblock s_inodes list
  Documentation: Exporting: update description of d_splice_alias
  fs: add missing unlock in default_llseek()
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback</title>
<updated>2011-07-26T17:39:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-07-26T17:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f01ef569cddb1a8627b1c6b3a134998ad1cf4b22'/>
<id>urn:sha1:f01ef569cddb1a8627b1c6b3a134998ad1cf4b22</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
  mm: properly reflect task dirty limits in dirty_exceeded logic
  writeback: don't busy retry writeback on new/freeing inodes
  writeback: scale IO chunk size up to half device bandwidth
  writeback: trace global_dirty_state
  writeback: introduce max-pause and pass-good dirty limits
  writeback: introduce smoothed global dirty limit
  writeback: consolidate variable names in balance_dirty_pages()
  writeback: show bdi write bandwidth in debugfs
  writeback: bdi write bandwidth estimation
  writeback: account per-bdi accumulated written pages
  writeback: make writeback_control.nr_to_write straight
  writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
  writeback: trace event writeback_queue_io
  writeback: trace event writeback_single_inode
  writeback: remove .nonblocking and .encountered_congestion
  writeback: remove writeback_control.more_io
  writeback: skip balance_dirty_pages() for in-memory fs
  writeback: add bdi_dirty_limit() kernel-doc
  writeback: avoid extra sync work at enqueue time
  writeback: elevate queue_io() into wb_writeback()
  ...

Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c
</content>
</entry>
<entry>
<title>vfs: dont chain pipe/anon/socket on superblock s_inodes list</title>
<updated>2011-07-26T16:57:09Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-26T09:36:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a209dfc7b0d94bd6fa94553c097836a2e6d0f0ba'/>
<id>urn:sha1:a209dfc7b0d94bd6fa94553c097836a2e6d0f0ba</id>
<content type='text'>
Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: kill i_alloc_sem</title>
<updated>2011-07-21T00:47:46Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-06-24T18:29:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd5fe6c5eb9c548d7f07fe8f89a150bb6705e8e3'/>
<id>urn:sha1:bd5fe6c5eb9c548d7f07fe8f89a150bb6705e8e3</id>
<content type='text'>
i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
be released by a non-owner, and it's write side is always mirrored by
real exclusion.  It's intended use it to wait for all pending direct I/O
requests to finish before starting a truncate.

Replace it with a hand-grown construct:

 - exclusion for truncates is already guaranteed by i_mutex, so it can
   simply fall way
 - the reader side is replaced by an i_dio_count member in struct inode
   that counts the number of pending direct I/O requests.  Truncate can't
   proceed as long as it's non-zero
 - when i_dio_count reaches non-zero we wake up a pending truncate using
   wake_up_bit on a new bit in i_flags
 - new references to i_dio_count can't appear while we are waiting for
   it to read zero because the direct I/O count always needs i_mutex
   (or an equivalent like XFS's i_iolock) for starting a new operation.

This scheme is much simpler, and saves the space of a spinlock_t and a
struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
system).

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>inode: remove iprune_sem</title>
<updated>2011-07-21T00:47:40Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2011-07-08T04:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4f8c19fdf3f97402b68f058b1c72a6c7166c9e59'/>
<id>urn:sha1:4f8c19fdf3f97402b68f058b1c72a6c7166c9e59</id>
<content type='text'>
Now that we have per-sb shrinkers with a lifecycle that is a subset
of the superblock lifecycle and can reliably detect a filesystem
being unmounted, there is not longer any race condition for the
iprune_sem to protect against. Hence we can remove it.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
