<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cpu.c, branch v5.14.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.14.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.14.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-06-29T19:23:02Z</updated>
<entry>
<title>Merge tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-06-29T19:23:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-06-29T19:23:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=62180152e0944e815ebbfd0ffd822d2b0e2cd8e7'/>
<id>urn:sha1:62180152e0944e815ebbfd0ffd822d2b0e2cd8e7</id>
<content type='text'>
Pull CPU hotplug fix from Thomas Gleixner:
 "A fix for the CPU hotplug and cpusets interaction:

  cpusets delegate the hotplug work to a workqueue to prevent a lock
  order inversion vs. the CPU hotplug lock. The work is not flushed
  before the hotplug operation returns which creates user visible
  inconsistent state. Prevent this by flushing the work after dropping
  CPU hotplug lock and before releasing the outer mutex which serializes
  the CPU hotplug related sysfs interface operations"

* tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Cure the cpusets trainwreck
</content>
</entry>
<entry>
<title>cpu/hotplug: Cure the cpusets trainwreck</title>
<updated>2021-06-21T08:31:06Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-03-27T21:01:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b22afcdf04c96ca58327784e280e10288cfd3303'/>
<id>urn:sha1:b22afcdf04c96ca58327784e280e10288cfd3303</id>
<content type='text'>
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.

cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -&gt; cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.

As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.

The same is true for CPU unplug, but that does not create user observable
failure (yet).

It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.

Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:

 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")

the lock order cgroups_mutex -&gt; cpu_hotplug_lock is a design decision and
the whole code is built around that.

So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.

Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.

Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.

This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.

Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.

Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.

Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov &lt;aklimov@redhat.com&gt;
Reported-by: Joshua Baker &lt;jobaker@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Alexey Klimov &lt;aklimov@redhat.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
</content>
</entry>
<entry>
<title>cpu/hotplug: Simplify access to percpu cpuhp_state</title>
<updated>2021-05-25T15:24:52Z</updated>
<author>
<name>Yuan ZhaoXiong</name>
<email>yuanzhaoxiong@baidu.com</email>
</author>
<published>2021-05-23T13:31:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=130708331bc6b03a3c3a78599333faddfebbd0f3'/>
<id>urn:sha1:130708331bc6b03a3c3a78599333faddfebbd0f3</id>
<content type='text'>
It is unnecessary to invoke per_cpu_ptr() everytime to access cpuhp_state.
Use the available pointer instead.

Signed-off-by: Yuan ZhaoXiong &lt;yuanzhaoxiong@baidu.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lore.kernel.org/r/1621776690-13264-1-git-send-email-yuanzhaoxiong@baidu.com

</content>
</entry>
<entry>
<title>cpumask/hotplug: Fix cpu_dying() state tracking</title>
<updated>2021-04-21T11:55:43Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-04-20T18:04:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ea46c6fc9452ac100ad907b051d797225847e33'/>
<id>urn:sha1:2ea46c6fc9452ac100ad907b051d797225847e33</id>
<content type='text'>
Vincent reported that for states with a NULL startup/teardown function
we do not call cpuhp_invoke_callback() (because there is none) and as
such we'll not update the cpu_dying() state.

The stale cpu_dying() can eventually lead to triggering BUG().

Rectify this by updating cpu_dying() in the exact same places the
hotplug machinery tracks its directional state, namely
cpuhp_set_state() and cpuhp_reset_state().

Reported-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Suggested-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>cpumask: Introduce DYING mask</title>
<updated>2021-04-16T15:06:32Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-01-19T17:43:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5'/>
<id>urn:sha1:e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5</id>
<content type='text'>
Introduce a cpumask that indicates (for each CPU) what direction the
CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when
an up fails and we do a roll-back down, it will accurately reflect the
direction.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.org
</content>
</entry>
<entry>
<title>cpu/hotplug: Add cpuhp_invoke_callback_range()</title>
<updated>2021-03-06T11:40:22Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2021-02-16T10:35:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=453e41085183980087f8a80dada523caf1131c3c'/>
<id>urn:sha1:453e41085183980087f8a80dada523caf1131c3c</id>
<content type='text'>
Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:

 cpuhp_thread_fun()
     state = st-&gt;state;
     st-&gt;state--;
     cpuhp_invoke_callback(state);

The second one, after:

 take_down_cpu()|cpuhp_down_callbacks()
     cpuhp_invoke_callback(st-&gt;state);
     st-&gt;state--;

This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.

Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.

Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210216103506.416286-4-vincent.donnefort@arm.com
</content>
</entry>
<entry>
<title>cpu/hotplug: CPUHP_BRINGUP_CPU failure exception</title>
<updated>2021-03-06T11:40:22Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2021-02-16T10:35:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=62f250694092dd5fef9900dc3126f07110bf9d48'/>
<id>urn:sha1:62f250694092dd5fef9900dc3126f07110bf9d48</id>
<content type='text'>
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter fails, no atomic
state can be rolled back.

DEAD callbacks too can't fail and disallow recovery. As a consequence,
during hotunplug, the fail injection interface should prohibit all states
from CPUHP_BRINGUP_CPU to CPUHP_ONLINE.

Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210216103506.416286-3-vincent.donnefort@arm.com
</content>
</entry>
<entry>
<title>cpu/hotplug: Allowing to reset fail injection</title>
<updated>2021-03-06T11:40:22Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2021-02-16T10:35:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ae70c251f344976428d1f6ee61ea7b4e170fec3'/>
<id>urn:sha1:3ae70c251f344976428d1f6ee61ea7b4e170fec3</id>
<content type='text'>
Currently, the only way of resetting the fail injection is to trigger a
hotplug, hotunplug or both. This is rather annoying for testing
and, as the default value for this file is -1, it seems pretty natural to
let a user write it.

Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210216103506.416286-2-vincent.donnefort@arm.com
</content>
</entry>
<entry>
<title>cpu/hotplug: Add lockdep_is_cpus_held()</title>
<updated>2021-01-07T00:24:59Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2020-11-11T22:53:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=43759fe5a137389e94ed6d4680c3c63c17273158'/>
<id>urn:sha1:43759fe5a137389e94ed6d4680c3c63c17273158</id>
<content type='text'>
This commit adds a lockdep_is_cpus_held() function to verify that the
proper locks are held and that various operations are running in the
correct context.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Neeraj Upadhyay &lt;neeraju@codeaurora.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-12-15T02:29:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-12-15T02:29:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=adb35e8dc98ba9bda99ff79ac6a05b8fcde2a762'/>
<id>urn:sha1:adb35e8dc98ba9bda99ff79ac6a05b8fcde2a762</id>
<content type='text'>
Pull scheduler updates from Thomas Gleixner:

 - migrate_disable/enable() support which originates from the RT tree
   and is now a prerequisite for the new preemptible kmap_local() API
   which aims to replace kmap_atomic().

 - A fair amount of topology and NUMA related improvements

 - Improvements for the frequency invariant calculations

 - Enhanced robustness for the global CPU priority tracking and decision
   making

 - The usual small fixes and enhancements all over the place

* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
  sched/fair: Trivial correction of the newidle_balance() comment
  sched/fair: Clear SMT siblings after determining the core is not idle
  sched: Fix kernel-doc markup
  x86: Print ratio freq_max/freq_base used in frequency invariance calculations
  x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
  x86, sched: Calculate frequency invariance for AMD systems
  irq_work: Optimize irq_work_single()
  smp: Cleanup smp_call_function*()
  irq_work: Cleanup
  sched: Limit the amount of NUMA imbalance that can exist at fork time
  sched/numa: Allow a floating imbalance between NUMA nodes
  sched: Avoid unnecessary calculation of load imbalance at clone time
  sched/numa: Rename nr_running and break out the magic number
  sched: Make migrate_disable/enable() independent of RT
  sched/topology: Condition EAS enablement on FIE support
  arm64: Rebuild sched domains on invariance status changes
  sched/topology,schedutil: Wrap sched domains rebuild
  sched/uclamp: Allow to reset a task uclamp constraint value
  sched/core: Fix typos in comments
  Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
  ...
</content>
</entry>
</feed>
