<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/user_namespace.c, branch v4.15.6</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.15.6</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.15.6'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-11-16T20:20:15Z</updated>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2017-11-16T20:20:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-16T20:20:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=758f875848d78148cf9a9cdb3ff1ddf29b234056'/>
<id>urn:sha1:758f875848d78148cf9a9cdb3ff1ddf29b234056</id>
<content type='text'>
Pull user namespace update from Eric Biederman:
 "The only change that is production ready this round is the work to
  increase the number of uid and gid mappings a user namespace can
  support from 5 to 340.

  This code was carefully benchmarked and it was confirmed that in the
  existing cases the performance remains the same. In the worst case
  with 340 mappings an cache cold stat times go from 158ns to 248ns.
  That is noticable but still quite small, and only the people who are
  doing crazy things pay the cost.

  This work uncovered some documentation and cleanup opportunities in
  the mapping code, and patches to make those cleanups and improve the
  documentation will be coming in the next merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Simplify insert_extent
  userns: Make map_id_down a wrapper for map_id_range_down
  userns: Don't read extents twice in m_start
  userns: Simplify the user and group mapping functions
  userns: Don't special case a count of 0
  userns: bump idmap limits to 340
  userns: use union in {g,u}idmap struct
</content>
</entry>
<entry>
<title>userns: Simplify insert_extent</title>
<updated>2017-10-31T22:23:13Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-10-31T22:15:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fda0e737e906ce73220b20c27e7f792d0aac6a8'/>
<id>urn:sha1:3fda0e737e906ce73220b20c27e7f792d0aac6a8</id>
<content type='text'>
Consolidate the code to write to the new mapping at the end of the
function to remove the duplication.  Move the increase in the number
of mappings into insert_extent, keeping the logic together.

Just a small increase in readability and maintainability.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: Make map_id_down a wrapper for map_id_range_down</title>
<updated>2017-10-31T22:23:13Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-10-31T21:53:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ece66133979b211324cc6aff9285889b425243d2'/>
<id>urn:sha1:ece66133979b211324cc6aff9285889b425243d2</id>
<content type='text'>
There is no good reason for this code duplication, the number of cache
line accesses not the number of instructions are the bottleneck in
this code.

Therefore simplify maintenance by removing unnecessary code.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: Don't read extents twice in m_start</title>
<updated>2017-10-31T22:23:12Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-10-31T22:09:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d5e7b3c5f51fc6d34e12b6d87bfd30ab277c4625'/>
<id>urn:sha1:d5e7b3c5f51fc6d34e12b6d87bfd30ab277c4625</id>
<content type='text'>
This is important so reading /proc/&lt;pid&gt;/{uid_map,gid_map,projid_map} while
the map is being written does not do strange things.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: Simplify the user and group mapping functions</title>
<updated>2017-10-31T22:23:12Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-10-31T21:27:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3edf652fa16562fb57a5a4b996ba72e2d7cdc38b'/>
<id>urn:sha1:3edf652fa16562fb57a5a4b996ba72e2d7cdc38b</id>
<content type='text'>
Consolidate reading the number of extents and computing the return
value in the map_id_down, map_id_range_down and map_id_range.

This removal of one read of extents makes one smp_rmb unnecessary
and makes the code safe it is executed during the map write.  Reading
the number of extents twice and depending on the result being the same
is not safe, as it could be 0 the first time and &gt; 5 the second time,
which would lead to misinterpreting the union fields.

The consolidation of the return value just removes a duplicate
caluculation which should make it easier to understand and maintain
the code.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: Don't special case a count of 0</title>
<updated>2017-10-31T22:23:11Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-10-31T20:54:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=11a8b9270e16e36d5fb607ba4b60db2958b7c625'/>
<id>urn:sha1:11a8b9270e16e36d5fb607ba4b60db2958b7c625</id>
<content type='text'>
We can always use a count of 1 so there is no reason to have
a special case of a count of 0.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>userns: bump idmap limits to 340</title>
<updated>2017-10-31T22:23:04Z</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2017-10-24T22:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6397fac4915ab3002dc15aae751455da1a852f25'/>
<id>urn:sha1:6397fac4915ab3002dc15aae751455da1a852f25</id>
<content type='text'>
There are quite some use cases where users run into the current limit for
{g,u}id mappings. Consider a user requesting us to map everything but 999, and
1001 for a given range of 1000000000 with a sub{g,u}id layout of:

some-user:100000:1000000000
some-user:999:1
some-user:1000:1
some-user:1001:1
some-user:1002:1

This translates to:

MAPPING-TYPE | CONTAINER |    HOST |     RANGE |
-------------|-----------|---------|-----------|
         uid |       999 |     999 |         1 |
         uid |      1001 |    1001 |         1 |
         uid |         0 | 1000000 |       999 |
         uid |      1000 | 1001000 |         1 |
         uid |      1002 | 1001002 | 999998998 |
------------------------------------------------
         gid |       999 |     999 |         1 |
         gid |      1001 |    1001 |         1 |
         gid |         0 | 1000000 |       999 |
         gid |      1000 | 1001000 |         1 |
         gid |      1002 | 1001002 | 999998998 |

which is already the current limit.

As discussed at LPC simply bumping the number of limits is not going to work
since this would mean that struct uid_gid_map won't fit into a single cache-line
anymore thereby regressing performance for the base-cases. The same problem
seems to arise when using a single pointer. So the idea is to use

struct uid_gid_extent {
	u32 first;
	u32 lower_first;
	u32 count;
};

struct uid_gid_map { /* 64 bytes -- 1 cache line */
	u32 nr_extents;
	union {
		struct uid_gid_extent extent[UID_GID_MAP_MAX_BASE_EXTENTS];
		struct {
			struct uid_gid_extent *forward;
			struct uid_gid_extent *reverse;
		};
	};
};

For the base cases we will only use the struct uid_gid_extent extent member. If
we go over UID_GID_MAP_MAX_BASE_EXTENTS mappings we perform a single 4k
kmalloc() which means we can have a maximum of 340 mappings
(340 * size(struct uid_gid_extent) = 4080). For the latter case we use two
pointers "forward" and "reverse". The forward pointer points to an array sorted
by "first" and the reverse pointer points to an array sorted by "lower_first".
We can then perform binary search on those arrays.

Performance Testing:
When Eric introduced the extent-based struct uid_gid_map approach he measured
the performanc impact of his idmap changes:

&gt; My benchmark consisted of going to single user mode where nothing else was
&gt; running. On an ext4 filesystem opening 1,000,000 files and looping through all
&gt; of the files 1000 times and calling fstat on the individuals files. This was
&gt; to ensure I was benchmarking stat times where the inodes were in the kernels
&gt; cache, but the inode values were not in the processors cache. My results:

&gt; v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
&gt; v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
&gt; v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)

I used an identical approach on my laptop. Here's a thorough description of what
I did. I built a 4.14.0-rc4 mainline kernel with my new idmap patches applied. I
booted into single user mode and used an ext4 filesystem to open/create
1,000,000 files. Then I looped through all of the files calling fstat() on each
of them 1000 times and calculated the mean fstat() time for a single file. (The
test program can be found below.)

Here are the results. For fun, I compared the first version of my patch which
scaled linearly with the new version of the patch:

|   # MAPPINGS |   PATCH-V1 | PATCH-NEW |
|--------------|------------|-----------|
|   0 mappings |     158 ns |   158 ns  |
|   1 mappings |     164 ns |   157 ns  |
|   2 mappings |     170 ns |   158 ns  |
|   3 mappings |     175 ns |   161 ns  |
|   5 mappings |     187 ns |   165 ns  |
|  10 mappings |     218 ns |   199 ns  |
|  50 mappings |     528 ns |   218 ns  |
| 100 mappings |     980 ns |   229 ns  |
| 200 mappings |    1880 ns |   239 ns  |
| 300 mappings |    2760 ns |   240 ns  |
| 340 mappings | not tested |   248 ns  |

Here's the test program I used. I asked Eric what he did and this is a more
"advanced" implementation of the idea. It's pretty straight-forward:

 #define __GNU_SOURCE
 #define __STDC_FORMAT_MACROS
 #include &lt;errno.h&gt;
 #include &lt;dirent.h&gt;
 #include &lt;fcntl.h&gt;
 #include &lt;inttypes.h&gt;
 #include &lt;stdio.h&gt;
 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;
 #include &lt;unistd.h&gt;
 #include &lt;sys/stat.h&gt;
 #include &lt;sys/time.h&gt;
 #include &lt;sys/types.h&gt;

 int main(int argc, char *argv[])
 {
 	int ret;
 	size_t i, k;
 	int fd[1000000];
 	int times[1000];
 	char pathname[4096];
 	struct stat st;
 	struct timeval t1, t2;
 	uint64_t time_in_mcs;
 	uint64_t sum = 0;

 	if (argc != 2) {
 		fprintf(stderr, "Please specify a directory where to create "
 				"the test files\n");
 		exit(EXIT_FAILURE);
 	}

 	for (i = 0; i &lt; sizeof(fd) / sizeof(fd[0]); i++) {
 		sprintf(pathname, "%s/idmap_test_%zu", argv[1], i);
 		fd[i]= open(pathname, O_RDWR | O_CREAT, S_IXUSR | S_IXGRP | S_IXOTH);
 		if (fd[i] &lt; 0) {
 			ssize_t j;
 			for (j = i; j &gt;= 0; j--)
 				close(fd[j]);
 			exit(EXIT_FAILURE);
 		}
 	}

 	for (k = 0; k &lt; 1000; k++) {
 		ret = gettimeofday(&amp;t1, NULL);
 		if (ret &lt; 0)
 			goto close_all;

 		for (i = 0; i &lt; sizeof(fd) / sizeof(fd[0]); i++) {
 			ret = fstat(fd[i], &amp;st);
 			if (ret &lt; 0)
 				goto close_all;
 		}

 		ret = gettimeofday(&amp;t2, NULL);
 		if (ret &lt; 0)
 			goto close_all;

 		time_in_mcs = (1000000 * t2.tv_sec + t2.tv_usec) -
 			      (1000000 * t1.tv_sec + t1.tv_usec);
 		printf("Total time in micro seconds:       %" PRIu64 "\n",
 		       time_in_mcs);
 		printf("Total time in nanoseconds:         %" PRIu64 "\n",
 		       time_in_mcs * 1000);
 		printf("Time per file in nanoseconds:      %" PRIu64 "\n",
 		       (time_in_mcs * 1000) / 1000000);
 		times[k] = (time_in_mcs * 1000) / 1000000;
 	}

 close_all:
 	for (i = 0; i &lt; sizeof(fd) / sizeof(fd[0]); i++)
 		close(fd[i]);

 	if (ret &lt; 0)
 		exit(EXIT_FAILURE);

 	for (k = 0; k &lt; 1000; k++) {
 		sum += times[k];
 	}

 	printf("Mean time per file in nanoseconds: %" PRIu64 "\n", sum / 1000);

 	exit(EXIT_SUCCESS);;
 }

Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
CC: Serge Hallyn &lt;serge@hallyn.com&gt;
CC: Eric Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()</title>
<updated>2017-10-25T09:01:08Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2017-10-23T21:07:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6aa7de059173a986114ac43b8f50b297a86f09a8'/>
<id>urn:sha1:6aa7de059173a986114ac43b8f50b297a86f09a8</id>
<content type='text'>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>userns,pidns: Verify the userns for new pid namespaces</title>
<updated>2017-07-20T12:43:58Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-04-29T19:12:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2b426267c56773201f968fdb5eda6ab9ae94e34'/>
<id>urn:sha1:a2b426267c56773201f968fdb5eda6ab9ae94e34</id>
<content type='text'>
It is pointless and confusing to allow a pid namespace hierarchy and
the user namespace hierarchy to get out of sync.  The owner of a child
pid namespace should be the owner of the parent pid namespace or
a descendant of the owner of the parent pid namespace.

Otherwise it is possible to construct scenarios where a process has a
capability over a parent pid namespace but does not have the
capability over a child pid namespace.  Which confusingly makes
permission checks non-transitive.

It requires use of setns into a pid namespace (but not into a user
namespace) to create such a scenario.

Add the function in_userns to help in making this determination.

v2: Optimized in_userns by using level as suggested
    by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;

Ref: 49f4d8b93ccf ("pidns: Capture the user namespace and filter ns_last_pid")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for new header dependencies before moving code to &lt;linux/sched/signal.h&gt;</title>
<updated>2017-03-02T07:42:29Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-08T17:51:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3f07c0144132e4f59d88055ac8ff3e691a5fa2b8'/>
<id>urn:sha1:3f07c0144132e4f59d88055ac8ff3e691a5fa2b8</id>
<content type='text'>
We are going to split &lt;linux/sched/signal.h&gt; out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder &lt;linux/sched/signal.h&gt; file that just
maps to &lt;linux/sched.h&gt; to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
