<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sys.c, branch v3.18.40</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.40</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.40'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-10-13T14:23:15Z</updated>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-10-13T14:23:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-10-13T14:23:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=faafcba3b5e15999cf75d5c5a513ac8e47e2545f'/>
<id>urn:sha1:faafcba3b5e15999cf75d5c5a513ac8e47e2545f</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
     Hansen)

   - Various sched/idle refinements for better idle handling (Nicolas
     Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

   - sched/numa updates and optimizations (Rik van Riel)

   - sysbench speedup (Vincent Guittot)

   - capacity calculation cleanups/refactoring (Vincent Guittot)

   - Various cleanups to thread group iteration (Oleg Nesterov)

   - Double-rq-lock removal optimization and various refactorings
     (Kirill Tkhai)

   - various sched/deadline fixes

  ... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
  sched/fair: Delete resched_cpu() from idle_balance()
  sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
  sched: Improve sysbench performance by fixing spurious active migration
  sched/x86: Fix up typo in topology detection
  x86, sched: Add new topology for multi-NUMA-node CPUs
  sched/rt: Use resched_curr() in task_tick_rt()
  sched: Use rq-&gt;rd in sched_setaffinity() under RCU read lock
  sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
  sched: Use dl_bw_of() under RCU read lock
  sched/fair: Remove duplicate code from can_migrate_task()
  sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  sched: print_rq(): Don't use tasklist_lock
  sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
  sched: Fix the task-group check in tg_has_rt_tasks()
  sched/fair: Leverage the idle state info when choosing the "idlest" cpu
  sched: Let the scheduler see CPU idle states
  sched/deadline: Fix inter- exclusive cpusets migrations
  sched/deadline: Clear dl_entity params when setscheduling to different class
  sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
  ...
</content>
</entry>
<entry>
<title>kernel/sys.c: compat sysinfo syscall: fix undefined behavior</title>
<updated>2014-10-10T02:26:04Z</updated>
<author>
<name>Scotty Bauer</name>
<email>sbauer@eng.utah.edu</email>
</author>
<published>2014-10-09T22:30:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0baae41ea8365a7b5a34c6474a77d7eb1126f6b2'/>
<id>urn:sha1:0baae41ea8365a7b5a34c6474a77d7eb1126f6b2</id>
<content type='text'>
Fix undefined behavior and compiler warning by replacing right shift 32
with upper_32_bits macro

Signed-off-by: Scotty Bauer &lt;sbauer@eng.utah.edu&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/sys.c: whitespace fixes</title>
<updated>2014-10-10T02:26:04Z</updated>
<author>
<name>vishnu.ps</name>
<email>vishnu.ps@samsung.com</email>
</author>
<published>2014-10-09T22:30:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ec94fc3d59b54561da03a0e433d93217b08c1481'/>
<id>urn:sha1:ec94fc3d59b54561da03a0e433d93217b08c1481</id>
<content type='text'>
Fix minor errors and warning messages in kernel/sys.c.  These errors were
reported by checkpatch while working with some modifications in sys.c
file.  Fixing this first will help me to improve my further patches.

ERROR: trailing whitespace - 9
ERROR: do not use assignment in if condition - 4
ERROR: spaces required around that '?' (ctx:VxO) - 10
ERROR: switch and case should be at the same indent - 3

total 26 errors &amp; 3 warnings fixed.

Signed-off-by: vishnu.ps &lt;vishnu.ps@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use VM_BUG_ON_MM where possible</title>
<updated>2014-10-10T02:25:58Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2014-10-09T22:28:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d'/>
<id>urn:sha1:96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d</id>
<content type='text'>
Dump the contents of the relevant struct_mm when we hit the bug condition.

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation</title>
<updated>2014-10-10T02:25:55Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@openvz.org</email>
</author>
<published>2014-10-09T22:27:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f606b77f1a9e362451aca8f81d8f36a3a112139e'/>
<id>urn:sha1:f606b77f1a9e362451aca8f81d8f36a3a112139e</id>
<content type='text'>
During development of c/r we've noticed that in case if we need to support
user namespaces we face a problem with capabilities in prctl(PR_SET_MM,
...) call, in particular once new user namespace is created
capable(CAP_SYS_RESOURCE) no longer passes.

A approach is to eliminate CAP_SYS_RESOURCE check but pass all new values
in one bundle, which would allow the kernel to make more intensive test
for sanity of values and same time allow us to support checkpoint/restore
of user namespaces.

Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
prctl_mm_map structure which carries all the members to be updated.

	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)

	struct prctl_mm_map {
		__u64	start_code;
		__u64	end_code;
		__u64	start_data;
		__u64	end_data;
		__u64	start_brk;
		__u64	brk;
		__u64	start_stack;
		__u64	arg_start;
		__u64	arg_end;
		__u64	env_start;
		__u64	env_end;
		__u64	*auxv;
		__u32	auxv_size;
		__u32	exe_fd;
	};

All members except @exe_fd correspond ones of struct mm_struct.  To figure
out which available values these members may take here are meanings of the
members.

 - start_code, end_code: represent bounds of executable code area
 - start_data, end_data: represent bounds of data area
 - start_brk, brk: used to calculate bounds for brk() syscall
 - start_stack: used when accounting space needed for command
   line arguments, environment and shmat() syscall
 - arg_start, arg_end, env_start, env_end: represent memory area
   supplied for command line arguments and environment variables
 - auxv, auxv_size: carries auxiliary vector, Elf format specifics
 - exe_fd: file descriptor number for executable link (/proc/self/exe)

Thus we apply the following requirements to the values

1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
   in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
   interval.

2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
   VMAs (say a program maps own new .text and .data segments during execution)
   the rest of members should belong to VMA which must exist.

3) Addresses must be ordered, ie @start_ member must not be greater or
   equal to appropriate @end_ member.

4) As in regular Elf loading procedure we require that @start_brk and
   @brk be greater than @end_data.

5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
   exceed existing limit. Same applies to RLIMIT_STACK.

6) Auxiliary vector size must not exceed existing one (which is
   predefined as AT_VECTOR_SIZE and depends on architecture).

7) File descriptor passed in @exe_file should be pointing
   to executable file (because we use existing prctl_set_mm_exe_file_locked
   helper it ensures that the file we are going to use as exe link has all
   required permission granted).

Now about where these members are involved inside kernel code:

 - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;

 - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
   also they are considered if there enough space for brk() syscall
   result if RLIMIT_DATA is set;

 - @start_brk shown in /proc/$pid/stat output and accounted in brk()
   syscall if RLIMIT_DATA is set; also this member is tested to
   find a symbolic name of mmap event for perf system (we choose
   if event is generated for "heap" area); one more aplication is
   selinux -- we test if a process has PROCESS__EXECHEAP permission
   if trying to make heap area being executable with mprotect() syscall;

 - @brk is a current value for brk() syscall which lays inside heap
   area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
   provides new memory area to a user space upon brk() completion the
   mm::brk is updated to carry new value;

   Both @start_brk and @brk are actively used in /proc/$pid/maps
   and /proc/$pid/smaps output to find a symbolic name "heap" for
   VMA being scanned;

 - @start_stack is printed out in /proc/$pid/stat and used to
   find a symbolic name "stack" for task and threads in
   /proc/$pid/maps and /proc/$pid/smaps output, and as the same
   as with @start_brk -- perf system uses it for event naming.
   Also kernel treat this member as a start address of where
   to map vDSO pages and to check if there is enough space
   for shmat() syscall;

 - @arg_start, @arg_end, @env_start and @env_end are printed out
   in /proc/$pid/stat. Another access to the data these members
   represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
   Any attempt to read these areas kernel tests with access_process_vm
   helper so a user must have enough rights for this action;

 - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
   speaking kernel doesn't care much about which exactly data is
   sitting there because it is solely for userspace;

 - @exe_fd is referred from /proc/$pid/exe and when generating
   coredump. We uses prctl_set_mm_exe_file_locked helper to update
   this member, so exe-file link modification remains one-shot
   action.

Still note that updating exe-file link now doesn't require sys-resource
capability anymore, after all there is no much profit in preventing setup
own file link (there are a number of ways to execute own code -- ptrace,
ld-preload, so that the only reliable way to find which exactly code is
executed is to inspect running program memory).  Still we require the
caller to be at least user-namespace root user.

I believe the old interface should be deprecated and ripped off in a
couple of kernel releases if no one against.

To test if new interface is implemented in the kernel one can pass
PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently
supported struct prctl_mm_map.

[akpm@linux-foundation.org: fix 80-col wordwrap in macro definitions]
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Andrew Vagin &lt;avagin@openvz.org&gt;
Tested-by: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Julien Tinnes &lt;jln@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: PR_SET_MM -- factor out mmap_sem when updating mm::exe_file</title>
<updated>2014-10-10T02:25:55Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@openvz.org</email>
</author>
<published>2014-10-09T22:27:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=71fe97e185040c5dac3216cd54e186dfa534efa0'/>
<id>urn:sha1:71fe97e185040c5dac3216cd54e186dfa534efa0</id>
<content type='text'>
Instead of taking mm-&gt;mmap_sem inside prctl_set_mm_exe_file() move it out
and rename the helper to prctl_set_mm_exe_file_locked().  This will allow
to reuse this function in a next patch.

Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Julien Tinnes &lt;jln@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use may_adjust_brk helper</title>
<updated>2014-10-10T02:25:55Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@openvz.org</email>
</author>
<published>2014-10-09T22:27:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8764b338b37524ab1a78aee527318ebee9762487'/>
<id>urn:sha1:8764b338b37524ab1a78aee527318ebee9762487</id>
<content type='text'>
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Julien Tinnes &lt;jln@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>time, signal: Protect resource use statistics with seqlock</title>
<updated>2014-09-08T06:17:01Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-08-16T17:40:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e78c3496790ee8a36522a838b59b388e8a709e65'/>
<id>urn:sha1:e78c3496790ee8a36522a838b59b388e8a709e65</id>
<content type='text'>
Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
issues on large systems, due to both functions being serialized with a
lock.

The lock protects against reporting a wrong value, due to a thread in the
task group exiting, its statistics reporting up to the signal struct, and
that exited task's statistics being counted twice (or not at all).

Protecting that with a lock results in times() and clock_gettime() being
completely serialized on large systems.

This can be fixed by using a seqlock around the events that gather and
propagate statistics. As an additional benefit, the protection code can
be moved into thread_group_cputime(), slightly simplifying the calling
functions.

In the case of posix_cpu_clock_get_task() things can be simplified a
lot, because the calling function already ensures that the task sticks
around, and the rest is now taken care of in thread_group_cputime().

This way the statistics reporting code can run lockless.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Daeseok Youn &lt;daeseok.youn@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dongsheng Yang &lt;yangds.fnst@cn.fujitsu.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Cc: Ionut Alexa &lt;ionut.m.alexa@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: move no_new_privs into new atomic flags</title>
<updated>2014-07-18T19:13:38Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2014-05-21T22:23:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d4457f99928a968767f6405b4a1f50845aa15fd'/>
<id>urn:sha1:1d4457f99928a968767f6405b4a1f50845aa15fd</id>
<content type='text'>
Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
</content>
</entry>
<entry>
<title>sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()</title>
<updated>2014-05-22T09:16:36Z</updated>
<author>
<name>Dongsheng Yang</name>
<email>yangds.fnst@cn.fujitsu.com</email>
</author>
<published>2014-05-08T09:33:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7aa2c016db2162defff77f6f5731bff3f25e5175'/>
<id>urn:sha1:7aa2c016db2162defff77f6f5731bff3f25e5175</id>
<content type='text'>
Signed-off-by: Dongsheng Yang &lt;yangds.fnst@cn.fujitsu.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/a568a1e3cc8e78648f41b5035fa5e381d36274da.1399532322.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
