<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/fs.h, branch v3.16.67</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.67</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.67'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-11-20T18:04:53Z</updated>
<entry>
<title>vfs: add the sb_start_intwrite_trylock() helper</title>
<updated>2018-11-20T18:04:53Z</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2018-05-14T02:40:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1a1c235b391833f0c2fe0a02592f1b8c3d04eaa8'/>
<id>urn:sha1:1a1c235b391833f0c2fe0a02592f1b8c3d04eaa8</id>
<content type='text'>
commit 0c8e3fe35db9b66ae0030849545030ec7c0fc45c upstream.

Needed by ext4 to test frozen fs before updating s_last_mounted.

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>rmap: drop support of non-linear mappings</title>
<updated>2018-10-03T03:09:51Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:09:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e4c2e72564fbefbe691e445d68c86967ec559bb2'/>
<id>urn:sha1:e4c2e72564fbefbe691e445d68c86967ec559bb2</id>
<content type='text'>
commit 27ba0644ea9dfe6e7693abc85837b60e40583b96 upstream.

We don't create non-linear mappings anymore.  Let's drop code which
handles them in rmap.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - Deleted code is slightly different
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: drop vm_ops-&gt;remap_pages and generic_file_remap_pages() stub</title>
<updated>2018-10-03T03:09:51Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:09:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ae347f225e91a960c61f7470a3b21bcb4ea5fcfa'/>
<id>urn:sha1:ae347f225e91a960c61f7470a3b21bcb4ea5fcfa</id>
<content type='text'>
commit d83a08db5ba6072caa658745881f4baa9bad6a08 upstream.

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - Deleted code is slightly different
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: replace remap_file_pages() syscall with emulation</title>
<updated>2018-10-03T03:09:50Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:09:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=097f98edde717ce09f217d8a285fe357dcd29fd1'/>
<id>urn:sha1:097f98edde717ce09f217d8a285fe357dcd29fd1</id>
<content type='text'>
commit c8d78c1823f46519473949d33f0d1d33fe21ea16 upstream.

remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.

Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.

Let's drop it and get rid of all these special-cased code.

The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.

I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on.  I've checked Debian code search and source of all
packages in ALT Linux.  No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.

There are few basic tests in LTP for the syscall.  They work just fine
with emulation.

To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.

The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.

Before:		23.3 ( +-  4.31% ) seconds
After:		43.9 ( +-  0.85% ) seconds
Slowdown:	1.88x

I believe we can live with that.

Test case:

        #define _GNU_SOURCE
        #include &lt;assert.h&gt;
        #include &lt;stdlib.h&gt;
        #include &lt;stdio.h&gt;
        #include &lt;sys/mman.h&gt;

        #define MB	(1024UL * 1024)
        #define SIZE	(4096 * MB)

        int main(int argc, char **argv)
        {
                unsigned long *p;
                long i, pass;

                for (pass = 0; pass &lt; 10; pass++) {
                        p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                        MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                        if (p == MAP_FAILED) {
                                perror("mmap");
                                return -1;
                        }

                        for (i = 0; i &lt; SIZE / 4096; i++)
                                p[i * 4096 / sizeof(*p)] = i;

                        for (i = 0; i &lt; SIZE / 4096; i++) {
                                if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                0, (SIZE - 4096 * (i + 1)) &gt;&gt; 12, 0)) {
                                        perror("remap_file_pages");
                                        return -1;
                                }
                        }

                        for (i = SIZE / 4096 - 1; i &gt;= 0; i--)
                                assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

                        munmap(p, SIZE);
                }

                return 0;
        }

[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Armin Rigo &lt;arigo@tunes.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - Deleted code is slightly different
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: Teach path_connected to handle nfs filesystems with multiple roots.</title>
<updated>2018-06-16T21:22:41Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2018-03-14T23:20:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=854ef66e18324ca2bd7da4b30c9817c0d6b46ad4'/>
<id>urn:sha1:854ef66e18324ca2bd7da4b30c9817c0d6b46ad4</id>
<content type='text'>
commit 95dd77580ccd66a0da96e6d4696945b8cea39431 upstream.

On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem.  The subsets can be from
disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.

The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem.  The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.

This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.

When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.

The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount.  This move can happen either locally or
remotely.  With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.

If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check.  As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.

The is_subdir function might be cheap enough that we can call it
unconditionally.  Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to.  So I am avoiding that for now.

Filesystems with snapshots such as nilfs and btrfs do something
similar.  But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.

Reported-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[bwh: Backported to 3.16:
 - Add the super_block::s_iflags field
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>Clarify (and fix) MAX_LFS_FILESIZE macros</title>
<updated>2017-11-11T13:33:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-08-27T19:12:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0ad3abfdc63abef74a235389c311d89891929c00'/>
<id>urn:sha1:0ad3abfdc63abef74a235389c311d89891929c00</id>
<content type='text'>
commit 0cc3b0ec23ce4c69e1e890ed2b8d2fa932b14aad upstream.

We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong.  We used to
define that value to

	(((loff_t)PAGE_SIZE &lt;&lt; (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense.  The index is actually the full
32 bit unsigned value, and we can use that whole full page.  So the
maximum size of the file would logically be "PAGE_SIZE &lt;&lt; BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow.  So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX &lt;&lt; PAGE_SHIFT.  That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed.  But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case.  That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar &lt;nazard@nazar.ca&gt;
Cc: Andreas Dilger &lt;adilger@dilger.ca&gt;
Cc: Mark Fasheh &lt;mfasheh@versity.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Dave Kleikamp &lt;shaggy@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16: 32-bit definition still used PAGE_CACHE_SIZE]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>vfs: Commit to never having exectuables on proc and sysfs.</title>
<updated>2017-04-04T21:21:56Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-06-29T19:42:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=495d1af4041fcb3cb726ca414d3c09cbb251fefa'/>
<id>urn:sha1:495d1af4041fcb3cb726ca414d3c09cbb251fefa</id>
<content type='text'>
commit 22f6b4d34fcf039c63a94e7670e0da24f8575a5a upstream.

Today proc and sysfs do not contain any executable files.  Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc.  Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
[bwh: Backported to 3.16: we don't have super_block::s_iflags; use
 file_system_type::fs_flags instead]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: Give dentry to inode_change_ok() instead of inode</title>
<updated>2017-02-23T03:53:52Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-26T14:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=50b070e8224f7bf86622ede1abee9fa3d3dc2f10'/>
<id>urn:sha1:50b070e8224f7bf86622ede1abee9fa3d3dc2f10</id>
<content type='text'>
commit 31051c85b5e2aaaf6315f74c72a732673632a905 upstream.

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[bwh: Backported to 3.16:
 - Drop changes to orangefs, overlayfs
 - Adjust filenames, context
 - In nfsd, pass dentry to nfsd_sanitize_attrs()
 - Update ext3 as well]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>Revert "fs: Give dentry to inode_change_ok() instead of inode"</title>
<updated>2017-02-23T03:53:52Z</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2016-11-30T23:13:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1c608c2d1aefca2bf63497663e17cfb49e6b022c'/>
<id>urn:sha1:1c608c2d1aefca2bf63497663e17cfb49e6b022c</id>
<content type='text'>
This reverts commit be9df699432235753c3824b0f5a27d46de7fdc9e, which was
commit 31051c85b5e2aaaf6315f74c72a732673632a905 upstream.  The backport
breaks fuse and makes a mess of xfs, which can be improved by picking
further upstream commits as I should have done in the first place.

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: Give dentry to inode_change_ok() instead of inode</title>
<updated>2016-11-20T01:17:38Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-26T14:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=be9df699432235753c3824b0f5a27d46de7fdc9e'/>
<id>urn:sha1:be9df699432235753c3824b0f5a27d46de7fdc9e</id>
<content type='text'>
commit 31051c85b5e2aaaf6315f74c72a732673632a905 upstream.

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[bwh: Backported to 3.16:
 - Drop changes to orangefs, overlayfs
 - Adjust filenames, context
 - In fuse, pass dentry to fuse_do_setattr()
 - In nfsd, pass dentry to nfsd_sanitize_attrs()
 - In xfs, pass dentry to xfs_setattr_nonsize() and xfs_setattr_size()
 - Update ext3 as well]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
