<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cpuset.c, branch v3.0.50</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.50</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.50'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-10-12T20:28:15Z</updated>
<entry>
<title>CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume</title>
<updated>2012-10-12T20:28:15Z</updated>
<author>
<name>Srivatsa S. Bhat</name>
<email>srivatsa.bhat@linux.vnet.ibm.com</email>
</author>
<published>2012-05-24T14:16:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f48f1a28ee27909afbba8a3c2c653a15f810c3e'/>
<id>urn:sha1:8f48f1a28ee27909afbba8a3c2c653a15f810c3e</id>
<content type='text'>
commit d35be8bab9b0ce44bed4b9453f86ebf64062721e upstream.

In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.

However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.

In order to achieve that, do the following:

1. Don't modify cpusets during suspend/resume. At all.
   In particular, don't move the tasks from one cpuset to another, and
   don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
   during the CPU hotplug operations that are carried out in the
   suspend/resume path.

2. However, cpusets and sched domains are related. We just want to avoid
   altering cpusets alone. So, to keep the sched domains updated, build
   a single sched domain (containing all active cpus) during each of the
   CPU hotplug operations carried out in s/r path, effectively ignoring
   the cpusets' cpus_allowed masks.

   (Since userspace is frozen while doing all this, it will go unnoticed.)

3. During the last CPU online operation during resume, build the sched
   domains by looking up the (unaltered) cpusets' cpus_allowed masks.
   That will bring back the system to the same original state as it was in
   before suspend.

Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.

Signed-off-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>cpuset: mm: reduce large amounts of memory barrier related damage v3</title>
<updated>2012-08-01T19:27:20Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-03-21T23:34:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=627c5c60b4ac673e9f4be758858073071684dce9'/>
<id>urn:sha1:627c5c60b4ac673e9f4be758858073071684dce9</id>
<content type='text'>
commit cc9a6c8776615f9c194ccf0b63a0aa5628235545 upstream.

Stable note:  Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This
	is part of a series of patches that reduce page allocator overhead.

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask</title>
<updated>2012-08-01T19:27:19Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-12-20T01:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba204b545c29176659273c5316afb147eec332a3'/>
<id>urn:sha1:ba204b545c29176659273c5316afb147eec332a3</id>
<content type='text'>
commit b246272ecc5ac68c743b15c9e41a2275f7ce70e2 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This is
	part of a series of patches that reduce page allocator overhead.

Kernels where MAX_NUMNODES &gt; BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread.  This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes.  This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind.  To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk-&gt;mems_allowed cannot change.  This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk-&gt;mempolicy.

Reported-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cpusets: avoid looping when storing to mems_allowed if one node remains set</title>
<updated>2012-08-01T19:27:19Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-11-02T20:38:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6b63ea81d831b2b6f2ce6d60cfcfa05b1858ad97'/>
<id>urn:sha1:6b63ea81d831b2b6f2ce6d60cfcfa05b1858ad97</id>
<content type='text'>
commit 89e8a244b97e48f1f30e898b6f32acca477f2a13 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is
	extremely expensive and severely impacted page allocator performance.
	This is part of a series of patches that reduce page allocator
	overhead.

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES &gt; BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself.  It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example.  In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes.  This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes.  Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk-&gt;rt.nr_cpus_allowed</title>
<updated>2011-05-28T15:02:57Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-05-19T06:08:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1e1b6c511d1b23cb7c3b619d82fc7bd9f620565d'/>
<id>urn:sha1:1e1b6c511d1b23cb7c3b619d82fc7bd9f620565d</id>
<content type='text'>
The rule is, we have to update tsk-&gt;rt.nr_cpus_allowed if we change
tsk-&gt;cpus_allowed. Otherwise RT scheduler may confuse.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>cgroup: remove the ns_cgroup</title>
<updated>2011-05-27T00:12:34Z</updated>
<author>
<name>Daniel Lezcano</name>
<email>daniel.lezcano@free.fr</email>
</author>
<published>2011-05-26T23:25:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a77aea92010acf54ad785047234418d5d68772e2'/>
<id>urn:sha1:a77aea92010acf54ad785047234418d5d68772e2</id>
<content type='text'>
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@free.fr&gt;
Signed-off-by: Serge E. Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Jamal Hadi Salim &lt;hadi@cyberus.ca&gt;
Reviewed-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Acked-by: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: add per-thread subsystem callbacks</title>
<updated>2011-05-27T00:12:34Z</updated>
<author>
<name>Ben Blum</name>
<email>bblum@andrew.cmu.edu</email>
</author>
<published>2011-05-26T23:25:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f780bdb7c1c73009cb57adcf99ef50027d80bf3c'/>
<id>urn:sha1:f780bdb7c1c73009cb57adcf99ef50027d80bf3c</id>
<content type='text'>
Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface.  Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this.  All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.

Signed-off-by: Ben Blum &lt;bblum@andrew.cmu.edu&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Reviewed-by: Paul Menage &lt;menage@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Dynamic sched_domain::level</title>
<updated>2011-04-11T12:09:32Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-04-07T12:10:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=60495e7760d8ee364695006af37309b0755e0e17'/>
<id>urn:sha1:60495e7760d8ee364695006af37309b0755e0e17</id>
<content type='text'>
Remove the SD_LV_ enum and use dynamic level assignments.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20110407122942.969433965@chello.nl
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>cpuset: hold callback_mutex in cpuset_post_clone()</title>
<updated>2011-03-24T02:46:35Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2011-03-23T23:42:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=523fb486bfd94e3a3b16a42bcb21b1959cf14df8'/>
<id>urn:sha1:523fb486bfd94e3a3b16a42bcb21b1959cf14df8</id>
<content type='text'>
Chaning cpuset-&gt;mems/cpuset-&gt;cpus should be protected under
callback_mutex.

cpuset_clone() doesn't follow this rule. It's ok because it's
called when creating and initializing a cgroup, but we'd better
hold the lock to avoid subtil break in the future.

Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Acked-by: Paul Menage &lt;menage@google.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cpuset: fix unchecked calls to NODEMASK_ALLOC()</title>
<updated>2011-03-24T02:46:35Z</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2011-03-23T23:42:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee24d3797780eee6ffe581a7b78d27896f9b494a'/>
<id>urn:sha1:ee24d3797780eee6ffe581a7b78d27896f9b494a</id>
<content type='text'>
Those functions that use NODEMASK_ALLOC() can't propagate errno
to users, but will fail silently.

Fix it by using a static nodemask_t variable for each function, and
those variables are protected by cgroup_mutex;

[akpm@linux-foundation.org: fix comment spelling, strengthen cgroup_lock comment]
Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
