<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/fork.c, branch v4.2-rc4</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.2-rc4</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.2-rc4'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-07-18T01:42:51Z</updated>
<entry>
<title>x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86</title>
<updated>2015-07-18T01:42:51Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2015-07-17T10:28:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5aaeb5c01c5b6c0be7b7aadbf3ace9f3a4458c3d'/>
<id>urn:sha1:5aaeb5c01c5b6c0be7b7aadbf3ace9f3a4458c3d</id>
<content type='text'>
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/fpu, sched: Dynamically allocate 'struct fpu'</title>
<updated>2015-07-18T01:42:35Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave@sr71.net</email>
</author>
<published>2015-07-17T10:28:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c8c0f03e3a292e031596484275c14cf39c0ab7a'/>
<id>urn:sha1:0c8c0f03e3a292e031596484275c14cf39c0ab7a</id>
<content type='text'>
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2015-06-27T02:50:04Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-06-27T02:50:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bbe179f88d39274630823a0dc07d2714fd19a103'/>
<id>urn:sha1:bbe179f88d39274630823a0dc07d2714fd19a103</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - threadgroup_lock got reorganized so that its users can pick the
   actual locking mechanism to use.  Its only user - cgroups - is
   updated to use a percpu_rwsem instead of per-process rwsem.

   This makes things a bit lighter on hot paths and allows cgroups to
   perform and fail multi-task (a process) migrations atomically.
   Multi-task migrations are used in several places including the
   unified hierarchy.

 - Delegation rule and documentation added to unified hierarchy.  This
   will likely be the last interface update from the cgroup core side
   for unified hierarchy before lifting the devel mask.

 - Some groundwork for the pids controller which is scheduled to be
   merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add delegation section to unified hierarchy documentation
  cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
  kernfs: make kernfs_get_inode() public
  MAINTAINERS: add a cgroup core co-maintainer
  cgroup: fix uninitialised iterator in for_each_subsys_which
  cgroup: replace explicit ss_mask checking with for_each_subsys_which
  cgroup: use bitmask to filter for_each_subsys
  cgroup: add seq_file forward declaration for struct cftype
  cgroup: simplify threadgroup locking
  sched, cgroup: replace signal_struct-&gt;group_rwsem with a global percpu_rwsem
  sched, cgroup: reorganize threadgroup locking
  cgroup: switch to unsigned long for bitmasks
  cgroup: reorganize include/linux/cgroup.h
  cgroup: separate out include/linux/cgroup-defs.h
  cgroup: fix some comment typos
</content>
</entry>
<entry>
<title>clone: support passing tls argument via C rather than pt_regs magic</title>
<updated>2015-06-26T00:00:38Z</updated>
<author>
<name>Josh Triplett</name>
<email>josh@joshtriplett.org</email>
</author>
<published>2015-06-25T22:01:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3033f14ab78c326871a4902591c2518410add24a'/>
<id>urn:sha1:3033f14ab78c326871a4902591c2518410add24a</id>
<content type='text'>
clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions.  In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel.  That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS.  The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks.  I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS.  However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread.  sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument.  Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order.  This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument.  Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Acked-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Thiago Macieira &lt;thiago.macieira@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched, cgroup: replace signal_struct-&gt;group_rwsem with a global percpu_rwsem</title>
<updated>2015-05-27T00:35:00Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-05-13T20:35:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14'/>
<id>urn:sha1:d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14</id>
<content type='text'>
The cgroup side of threadgroup locking uses signal_struct-&gt;group_rwsem
to synchronize against threadgroup changes.  This per-process rwsem
adds small overhead to thread creation, exit and exec paths, forces
cgroup code paths to do lock-verify-unlock-retry dance in a couple
places and makes it impossible to atomically perform operations across
multiple processes.

This patch replaces signal_struct-&gt;group_rwsem with a global
percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader
side and contained in cgroups proper.  This patch converts one-to-one.

This does make writer side heavier and lower the granularity; however,
cgroup process migration is a fairly cold path, we do want to optimize
thread operations over it and cgroup migration operations don't take
enough time for the lower granularity to matter.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>sched/preempt, mm/fault: Count pagefault_disable() levels in pagefault_disabled</title>
<updated>2015-05-19T06:39:13Z</updated>
<author>
<name>David Hildenbrand</name>
<email>dahi@linux.vnet.ibm.com</email>
</author>
<published>2015-05-11T15:52:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8bcbde5480f9777f8b74d71493722c663e22c21b'/>
<id>urn:sha1:8bcbde5480f9777f8b74d71493722c663e22c21b</id>
<content type='text'>
Until now, pagefault_disable()/pagefault_enabled() used the preempt
count to track whether in an environment with pagefaults disabled (can
be queried via in_atomic()).

This patch introduces a separate counter in task_struct to count the
level of pagefault_disable() calls. We'll keep manipulating the preempt
count to retain compatibility to existing pagefault handlers.

It is now possible to verify whether in a pagefault_disable() envionment
by calling pagefault_disabled(). In contrast to in_atomic() it will not
be influenced by preempt_enable()/preempt_disable().

This patch is based on a patch from Ingo Molnar.

Reviewed-and-tested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: David Hildenbrand &lt;dahi@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-2-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability</title>
<updated>2015-05-08T10:15:31Z</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2015-04-28T20:00:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1018016c706f7ff9f56fde3a649789c47085a293'/>
<id>urn:sha1:1018016c706f7ff9f56fde3a649789c47085a293</id>
<content type='text'>
While running a database workload, we found a scalability issue with itimers.

Much of the problem was caused by the thread_group_cputimer spinlock.
Each time we account for group system/user time, we need to obtain a
thread_group_cputimer's spinlock to update the timers. On larger systems
(such as a 16 socket machine), this caused more than 30% of total time
spent trying to obtain this kernel lock to update these group timer stats.

This patch converts the timers to 64-bit atomic variables and use
atomic add to update them without a lock. With this patch, the percent
of total time spent updating thread group cputimer timers was reduced
from 30% down to less than 1%.

Note: On 32-bit systems using the generic 64-bit atomics, this causes
sample_group_cputimer() to take locks 3 times instead of just 1 time.
However, we tested this patch on a 32-bit system ARM system using the
generic atomics and did not find the overhead to be much of an issue.
An explanation for why this isn't an issue is that 32-bit systems usually
have small numbers of CPUs, and cacheline contention from extra spinlocks
called periodically is not really apparent on smaller systems.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Cc: Scott J Norton &lt;scott.norton@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE()</title>
<updated>2015-05-08T10:11:32Z</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2015-04-28T20:00:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=316c1608d15c736439d4065ed12f306db554b3da'/>
<id>urn:sha1:316c1608d15c736439d4065ed12f306db554b3da</id>
<content type='text'>
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Cc: Scott J Norton &lt;scott.norton@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>oprofile: reduce mmap_sem hold for mm-&gt;exe_file</title>
<updated>2015-04-17T13:04:11Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2015-04-16T19:49:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=11163348a23cdbcdca5fb42485418e75f8566a5c'/>
<id>urn:sha1:11163348a23cdbcdca5fb42485418e75f8566a5c</id>
<content type='text'>
sync_buffer() needs the mmap_sem for two distinct operations, both only
occurring upon user context switch handling:

 1) Dealing with the exe_file.

 2) Adding the dcookie data as we need to lookup the vma that
   backs it. This is done via add_sample() and add_data().

This patch isolates 1), for it will no longer need the mmap_sem for
serialization.  However, for now, make of the more standard
get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
and relying on reference counting to make sure that the exe file won't
dissappear underneath us while doing the get dcookie.

As a consequence, for 2) we move the mmap_sem locking into where we really
need it, in lookup_dcookie().  The benefits are twofold: reduce mmap_sem
hold times, and cleaner code.

[akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Robert Richter &lt;rric@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: avoid using mmap_sem for exe_file serialization</title>
<updated>2015-04-17T13:04:07Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2015-04-16T19:47:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6e399cd144d8500ffb5d40fa6848890e2580a80a'/>
<id>urn:sha1:6e399cd144d8500ffb5d40fa6848890e2580a80a</id>
<content type='text'>
Oleg cleverly suggested using xchg() to set the new mm-&gt;exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case.  For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe.  As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock.  Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
