<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/pnode.c, branch v4.9.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-09-30T17:46:48Z</updated>
<entry>
<title>mnt: Add a per mount namespace limit on the number of mounts</title>
<updated>2016-09-30T17:46:48Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-09-28T05:27:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d29216842a85c7970c536108e093963f02714498'/>
<id>urn:sha1:d29216842a85c7970c536108e093963f02714498</id>
<content type='text'>
CAI Qian &lt;caiqian@redhat.com&gt; pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent &lt;raven@themaw.net&gt; described the situation for autofs users
as follows:

&gt; The number of mounts for direct mount maps is usually not very large because of
&gt; the way they are implemented, large direct mount maps can have performance
&gt; problems. There can be anywhere from a few (likely case a few hundred) to less
&gt; than 10000, plus mounts that have been triggered and not yet expired.
&gt;
&gt; Indirect mounts have one autofs mount at the root plus the number of mounts that
&gt; have been triggered and not yet expired.
&gt;
&gt; The number of autofs indirect map entries can range from a few to the common
&gt; case of several thousand and in rare cases up to between 30000 and 50000. I've
&gt; not heard of people with maps larger than 50000 entries.
&gt;
&gt; The larger the number of map entries the greater the possibility for a large
&gt; number of active mounts so it's not hard to expect cases of a 1000 or somewhat
&gt; more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.

Tested-by: CAI Qian &lt;caiqian@redhat.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>propogate_mnt: Handle the first propogated copy being a slave</title>
<updated>2016-05-05T14:54:45Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-05-05T14:29:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5ec0811d30378ae104f250bfc9b3640242d81e3f'/>
<id>urn:sha1:5ec0811d30378ae104f250bfc9b3640242d81e3f</id>
<content type='text'>
When the first propgated copy was a slave the following oops would result:
&gt; BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
&gt; IP: [&lt;ffffffff811fba4e&gt;] propagate_one+0xbe/0x1c0
&gt; PGD bacd4067 PUD bac66067 PMD 0
&gt; Oops: 0000 [#1] SMP
&gt; Modules linked in:
&gt; CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
&gt; Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
&gt; task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
&gt; RIP: 0010:[&lt;ffffffff811fba4e&gt;]  [&lt;ffffffff811fba4e&gt;] propagate_one+0xbe/0x1c0
&gt; RSP: 0018:ffff8800bac3fd38  EFLAGS: 00010283
&gt; RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
&gt; RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
&gt; RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
&gt; R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
&gt; R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
&gt; FS:  00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
&gt; CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&gt; CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
&gt; Stack:
&gt;  ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
&gt;  ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
&gt;  0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
&gt; Call Trace:
&gt;  [&lt;ffffffff811fbf85&gt;] propagate_mnt+0x105/0x140
&gt;  [&lt;ffffffff811f1da0&gt;] attach_recursive_mnt+0x120/0x1e0
&gt;  [&lt;ffffffff811f1ec3&gt;] graft_tree+0x63/0x70
&gt;  [&lt;ffffffff811f1f6b&gt;] do_add_mount+0x9b/0x100
&gt;  [&lt;ffffffff811f2c1a&gt;] do_mount+0x2aa/0xdf0
&gt;  [&lt;ffffffff8117efbe&gt;] ? strndup_user+0x4e/0x70
&gt;  [&lt;ffffffff811f3a45&gt;] SyS_mount+0x75/0xc0
&gt;  [&lt;ffffffff8100242b&gt;] do_syscall_64+0x4b/0xa0
&gt;  [&lt;ffffffff81988f3c&gt;] entry_SYSCALL64_slow_path+0x25/0x25
&gt; Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 &lt;48&gt; 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
&gt; RIP  [&lt;ffffffff811fba4e&gt;] propagate_one+0xbe/0x1c0
&gt;  RSP &lt;ffff8800bac3fd38&gt;
&gt; CR2: 0000000000000010
&gt; ---[ end trace 2725ecd95164f217 ]---

This oops happens with the namespace_sem held and can be triggered by
non-root users.  An all around not pleasant experience.

To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.

Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.

The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.

To avoid other kinds of confusion last_dest is not changed when
computing last_source.  last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.

Cc: stable@vger.kernel.org
fixes: f2ebb3a921c1ca1e2ddd9242e95a1989a50c4c68 ("smarter propagate_mnt()")
Reported-by: Tycho Andersen &lt;tycho.andersen@canonical.com&gt;
Reviewed-by: Seth Forshee &lt;seth.forshee@canonical.com&gt;
Tested-by: Seth Forshee &lt;seth.forshee@canonical.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>fs/pnode.c: treat zero mnt_group_id-s as unequal</title>
<updated>2016-02-20T05:15:52Z</updated>
<author>
<name>Maxim Patlasov</name>
<email>mpatlasov@virtuozzo.com</email>
</author>
<published>2016-02-16T19:45:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ae8fd0351f912b075149a1e03a017be8b903b9a'/>
<id>urn:sha1:7ae8fd0351f912b075149a1e03a017be8b903b9a</id>
<content type='text'>
propagate_one(m) calculates "type" argument for copy_tree() like this:

&gt;    if (m-&gt;mnt_group_id == last_dest-&gt;mnt_group_id) {
&gt;        type = CL_MAKE_SHARED;
&gt;    } else {
&gt;        type = CL_SLAVE;
&gt;        if (IS_MNT_SHARED(m))
&gt;           type |= CL_MAKE_SHARED;
&gt;   }

The "type" argument then governs clone_mnt() behavior with respect to flags
and mnt_master of new mount. When we iterate through a slave group, it is
possible that both current "m" and "last_dest" are not shared (although,
both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
above erroneously makes new mount shared and sets its mnt_master to
last_source-&gt;mnt_master. The patch fixes the problem by handling zero
mnt_group_id-s as though they are unequal.

The similar problem exists in the implementation of "else" clause above
when we have to ascend upward in the master/slave tree by calling:

&gt;    last_source = last_source-&gt;mnt_master;
&gt;    last_dest = last_source-&gt;mnt_parent;

proper number of times. The last step is governed by
"n-&gt;mnt_group_id != last_dest-&gt;mnt_group_id" condition that may lie if
both are zero. The patch fixes this case in the same way as the former one.

[AV: don't open-code an obvious helper...]

Signed-off-by: Maxim Patlasov &lt;mpatlasov@virtuozzo.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>mnt: Don't propagate unmounts to locked mounts</title>
<updated>2015-04-03T01:34:20Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-01-05T19:38:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c56fe31420ca599c90240315f7959bf1b4eb6ce'/>
<id>urn:sha1:0c56fe31420ca599c90240315f7959bf1b4eb6ce</id>
<content type='text'>
If the first mount in shared subtree is locked don't unmount the
shared subtree.

This is ensured by walking through the mounts parents before children
and marking a mount as unmountable if it is not locked or it is locked
but it's parent is marked.

This allows recursive mount detach to propagate through a set of
mounts when unmounting them would not reveal what is under any locked
mount.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mnt: On an unmount propagate clearing of MNT_LOCKED</title>
<updated>2015-04-03T01:34:19Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-01-03T11:39:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5d88457eb5b86b475422dc882f089203faaeedb5'/>
<id>urn:sha1:5d88457eb5b86b475422dc882f089203faaeedb5</id>
<content type='text'>
A prerequisite of calling umount_tree is that the point where the tree
is mounted at is valid to unmount.

If we are propagating the effect of the unmount clear MNT_LOCKED in
every instance where the same filesystem is mounted on the same
mountpoint in the mount tree, as we know (by virtue of the fact
that umount_tree was called) that it is safe to reveal what
is at that mountpoint.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mnt: Delay removal from the mount hash.</title>
<updated>2015-04-03T01:34:19Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-12-23T01:12:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=411a938b5abc9cb126c41cccf5975ae464fe0f3e'/>
<id>urn:sha1:411a938b5abc9cb126c41cccf5975ae464fe0f3e</id>
<content type='text'>
- Modify __lookup_mnt_hash_last to ignore mounts that have MNT_UMOUNTED set.
- Don't remove mounts from the mount hash table in propogate_umount
- Don't remove mounts from the mount hash table in umount_tree before
  the entire list of mounts to be umounted is selected.
- Remove mounts from the mount hash table as the last thing that
  happens in the case where a mount has a parent in umount_tree.
  Mounts without parents are not hashed (by definition).

This paves the way for delaying removal from the mount hash table even
farther and fixing the MNT_LOCKED vs MNT_DETACH issue.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mnt: Add MNT_UMOUNT flag</title>
<updated>2015-04-03T01:34:18Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-12-23T00:30:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=590ce4bcbfb4e0462a720a4ad901e84416080bba'/>
<id>urn:sha1:590ce4bcbfb4e0462a720a4ad901e84416080bba</id>
<content type='text'>
In some instances it is necessary to know if the the unmounting
process has begun on a mount.  Add MNT_UMOUNT to make that reliably
testable.

This fix gets used in fixing locked mounts in MNT_DETACH

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mnt: In umount_tree reuse mnt_list instead of mnt_hash</title>
<updated>2015-04-03T01:34:18Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-12-18T19:10:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c003b26ff98ca04a180ff34c38c007a3998d62f9'/>
<id>urn:sha1:c003b26ff98ca04a180ff34c38c007a3998d62f9</id>
<content type='text'>
umount_tree builds a list of mounts that need to be unmounted.
Utilize mnt_list for this purpose instead of mnt_hash.  This begins to
allow keeping a mount on the mnt_hash after it is unmounted, which is
necessary for a properly functioning MNT_LOCKED implementation.

The fact that mnt_list is an ordinary list makding available list_move
is nice bonus.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.</title>
<updated>2014-12-02T16:46:50Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-10-07T23:22:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8486a7882b5ba906992fd78bbfcefaae7fe285cc'/>
<id>urn:sha1:8486a7882b5ba906992fd78bbfcefaae7fe285cc</id>
<content type='text'>
Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and
collect_mounts.  In copy_mnt_ns it is necessary to create an exact
copy of a mount tree, so not clearing MNT_LOCKED is important.
Similarly collect_mounts is used to take a snapshot of the mount tree
for audit logging purposes and auditing using a faithful copy of the
tree is important.

This becomes particularly significant when we start setting MNT_LOCKED
on rootfs to prevent it from being unmounted.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>get rid of propagate_umount() mistakenly treating slaves as busy.</title>
<updated>2014-08-30T22:31:41Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-08-18T19:09:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=88b368f27a094277143d8ecd5a056116f6a41520'/>
<id>urn:sha1:88b368f27a094277143d8ecd5a056116f6a41520</id>
<content type='text'>
The check in __propagate_umount() ("has somebody explicitly mounted
something on that slave?") is done *before* taking the already doomed
victims out of the child lists.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
