<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/super.c, branch v4.4.166</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.166</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.166'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-03-03T09:19:41Z</updated>
<entry>
<title>sget(): handle failures of register_shrinker()</title>
<updated>2018-03-03T09:19:41Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2017-12-18T20:05:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4cf1dead545b29f92722c8d1021db26beee31c83'/>
<id>urn:sha1:4cf1dead545b29f92722c8d1021db26beee31c83</id>
<content type='text'>
[ Upstream commit 9ee332d99e4d5a97548943b81c54668450ce641b ]

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs/super.c: fix race between freeze_super() and thaw_super()</title>
<updated>2016-10-28T07:01:32Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2016-09-26T16:07:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2f634e1322f7e1ca0973a606cb24f5f669884359'/>
<id>urn:sha1:2f634e1322f7e1ca0973a606cb24f5f669884359</id>
<content type='text'>
commit 89f39af129382a40d7cd1f6914617282cfeee28e upstream.

Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than
frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which
drops sb-&gt;s_umount after SB_FREEZE_WRITE to preserve the lock ordering.

In this case thaw_super() will wrongly call s_op-&gt;unfreeze_fs() before
it was actually frozen, and call sb_freeze_unlock() which leads to the
unbalanced percpu_up_write(). Unfortunately lockdep can't detect this,
so this triggers misc BUG_ON()'s in kernel/rcu/sync.c.

Reported-and-tested-by: Nikolay Borisov &lt;kernel@kyup.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>writeback: flush inode cgroup wb switches instead of pinning super_block</title>
<updated>2016-03-09T23:34:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-29T23:28:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c5cbbec54fe71c4de2d34f8c0ec8fbfdd7f17339'/>
<id>urn:sha1:c5cbbec54fe71c4de2d34f8c0ec8fbfdd7f17339</id>
<content type='text'>
commit a1a0e23e49037c23ea84bc8cc146a03584d13577 upstream.

If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching.  Before 5ff8eaac1636 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching.  5ff8eaac1636 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.

This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches.  wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.

v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
    check in inode_switch_wbs() as Jan an Al suggested.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Tahsin Erdogan &lt;tahsin@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches")
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Tested-by: Tahsin Erdogan &lt;tahsin@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge branch 'superblock-scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next</title>
<updated>2015-08-21T06:31:20Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-08-21T06:31:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=061f98e959ea025a5d87c3e089d59ec2ec5f5d6d'/>
<id>urn:sha1:061f98e959ea025a5d87c3e089d59ec2ec5f5d6d</id>
<content type='text'>
Conflicts:
	include/linux/fs.h
</content>
</entry>
<entry>
<title>sync: serialise per-superblock sync operations</title>
<updated>2015-08-17T22:39:47Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2015-03-04T18:40:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e97fedb9ef9868ff24d588be781906cf7c1b59ae'/>
<id>urn:sha1:e97fedb9ef9868ff24d588be781906cf7c1b59ae</id>
<content type='text'>
When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls can take longer and
burn more CPU than if they were serialised.

Stop the worst of the contention by adding a per-sb mutex to wrap
around wait_sb_inodes() so that we only execute one sync(2) IO
completion walk per superblock superblock at a time and hence avoid
contention being triggered by concurrent sync(2) calls.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
</content>
</entry>
<entry>
<title>inode: convert inode_sb_list_lock to per-sb</title>
<updated>2015-08-17T22:39:46Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2015-03-04T17:37:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=74278da9f70d84d715601fe794567a6d2bfdf078'/>
<id>urn:sha1:74278da9f70d84d715601fe794567a6d2bfdf078</id>
<content type='text'>
The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
</content>
</entry>
<entry>
<title>change sb_writers to use percpu_rw_semaphore</title>
<updated>2015-08-15T11:52:13Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-08-11T15:05:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8129ed29644bf56ed17ec1bbbeed5c568b43d6a0'/>
<id>urn:sha1:8129ed29644bf56ed17ec1bbbeed5c568b43d6a0</id>
<content type='text'>
We can remove everything from struct sb_writers except frozen
and add the array of percpu_rw_semaphore's instead.

This patch doesn't remove sb_writers-&gt;wait_unfrozen yet, we keep
it for get_super_thawed(). We will probably remove it later.

This change tries to address the following problems:

	- Firstly, __sb_start_write() looks simply buggy. It does
	  __sb_end_write() if it sees -&gt;frozen, but if it migrates
	  to another CPU before percpu_counter_dec(), sb_wait_write()
	  can wrongly succeed if there is another task which holds
	  the same "semaphore": sb_wait_write() can miss the result
	  of the previous percpu_counter_inc() but see the result
	  of this percpu_counter_dec().

	- As Dave Hansen reports, it is suboptimal. The trivial
	  microbenchmark that writes to a tmpfs file in a loop runs
	  12% faster if we change this code to rely on RCU and kill
	  the memory barriers.

	- This code doesn't look simple. It would be better to rely
	  on the generic locking code.

	  According to Dave, this change adds the same performance
	  improvement.

Note: with this change both freeze_super() and thaw_super() will do
synchronize_sched_expedited() 3 times. This is just ugly. But:

	- This will be "fixed" by the rcu_sync changes we are going
	  to merge. After that freeze_super()-&gt;percpu_down_write()
	  will use synchronize_sched(), and thaw_super() won't use
	  synchronize() at all.

	  This doesn't need any changes in fs/super.c.

	- Once we merge rcu_sync changes, we can also change super.c
	  so that all wb_write-&gt;rw_sem's will share the single -&gt;rss
	  in struct sb_writes, then freeze_super() will need only one
	  synchronize_sched().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
</content>
</entry>
<entry>
<title>shift percpu_counter_destroy() into destroy_super_work()</title>
<updated>2015-08-15T11:52:11Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-07-22T18:21:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=853b39a7c82826b8413048feec7bf08e98ce7a84'/>
<id>urn:sha1:853b39a7c82826b8413048feec7bf08e98ce7a84</id>
<content type='text'>
Of course, this patch is ugly as hell. It will be (partially)
reverted later. We add it to ensure that other WIP changes in
percpu_rw_semaphore won't break fs/super.c.

We do not even need this change right now, percpu_free_rwsem()
is fine in atomic context. But we are going to change this, it
will be might_sleep() after we merge the rcu_sync() patches.

And even after that we do not really need destroy_super_work(),
we will kill it in any case. Instead, destroy_super_rcu() should
just check that rss-&gt;cb_state == CB_IDLE and do call_rcu() again
in the (very unlikely) case this is not true.

So this is just the temporary kludge which helps us to avoid the
conflicts with the changes which will be (hopefully) routed via
rcu tree.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
</content>
</entry>
<entry>
<title>document rwsem_release() in sb_wait_write()</title>
<updated>2015-08-15T11:52:09Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-08-11T14:28:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e28e01f1e73015d8e1b8fa1cda071d0bd9a2600'/>
<id>urn:sha1:0e28e01f1e73015d8e1b8fa1cda071d0bd9a2600</id>
<content type='text'>
Not only we need to avoid the warning from lockdep_sys_exit(), the
caller of freeze_super() can never release this lock. Another thread
can do this, so there is another reason for rwsem_release().

Plus the comment should explain why we have to fool lockdep.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
</content>
</entry>
<entry>
<title>fix the broken lockdep logic in __sb_start_write()</title>
<updated>2015-08-15T11:52:09Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-07-19T22:50:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4b554af9931585174d4913b482eacab75858964'/>
<id>urn:sha1:f4b554af9931585174d4913b482eacab75858964</id>
<content type='text'>
1. wait_event(frozen &lt; level) without rwsem_acquire_read() is just
   wrong from lockdep perspective. If we are going to deadlock
   because the caller is buggy, lockdep can't detect this problem.

2. __sb_start_write() can race with thaw_super() + freeze_super(),
   and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

3. The "tell lockdep we are doing trylock" hack doesn't look nice.

   I think this is correct, but this logic should be more explicit.
   Yes, the recursive read_lock() is fine if we hold the lock on a
   higher level. But we do not need to fool lockdep. If we can not
   deadlock in this case then try-lock must not fail and we can use
   use wait == F throughout this code.

Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.com&gt;
</content>
</entry>
</feed>
