user/sven/linux.git/kernel/cpu.c, branch v6.6.25

hrtimers: Push pending hrtimers away from outgoing CPU earlier

2023-12-13T17:44:56Z

[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ] 2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the scheduler bandwidth timer. Yu discovered a more involved variant where a task which has a bandwidth timer started on the outgoing CPU holds a lock and then gets throttled. If the lock required by one of the CPU hotplug callbacks the hotplug operation deadlocks because the unthrottling timer event is not handled on the dying CPU and can only be recovered once the control CPU reaches the hotplug state which pulls the pending hrtimers from the dead CPU. Solve this by pushing the hrtimers away from the dying CPU in the dying callbacks. Nothing can queue a hrtimer on the dying CPU at that point because all other CPUs spin in stop_machine() with interrupts disabled and once the operation is finished the CPU is marked offline. Reported-by: Yu Liao Signed-off-by: Thomas Gleixner Tested-by: Liu Tie Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx Signed-off-by: Sasha Levin

cpu/hotplug: Don't offline the last non-isolated CPU

2023-11-28T17:19:36Z

[ Upstream commit 38685e2a0476127db766f81b1c06019ddc4c9ffa ] If a system has isolated CPUs via the "isolcpus=" command line parameter, then an attempt to offline the last housekeeping CPU will result in a WARN_ON() when rebuilding the scheduler domains and a subsequent panic due to and unhandled empty CPU mas in partition_sched_domains_locked(). cpuset_hotplug_workfn() rebuild_sched_domains_locked() ndoms = generate_sched_domains(&doms, &attr); cpumask_and(doms[0], top_cpuset.effective_cpus, housekeeping_cpumask(HK_FLAG_DOMAIN)); Thus results in an empty CPU mask which triggers the warning and then the subsequent crash: WARNING: CPU: 4 PID: 80 at kernel/sched/topology.c:2366 build_sched_domains+0x120c/0x1408 Call trace: build_sched_domains+0x120c/0x1408 partition_sched_domains_locked+0x234/0x880 rebuild_sched_domains_locked+0x37c/0x798 rebuild_sched_domains+0x30/0x58 cpuset_hotplug_workfn+0x2a8/0x930 Unable to handle kernel paging request at virtual address fffe80027ab37080 partition_sched_domains_locked+0x318/0x880 rebuild_sched_domains_locked+0x37c/0x798 Aside of the resulting crash, it does not make any sense to offline the last last housekeeping CPU. Prevent this by masking out the non-housekeeping CPUs when selecting a target CPU for initiating the CPU unplug operation via the work queue. Suggested-by: Thomas Gleixner Signed-off-by: Ran Xiaokai Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/202310171709530660462@zte.com.cn Signed-off-by: Sasha Levin

cpu/SMT: Make SMT control more robust against enumeration failures

2023-11-20T10:58:53Z

[ Upstream commit d91bdd96b55cc3ce98d883a60f133713821b80a6 ] The SMT control mechanism got added as speculation attack vector mitigation. The implemented logic relies on the primary thread mask to be set up properly. This turns out to be an issue with XEN/PV guests because their CPU hotplug mechanics do not enumerate APICs and therefore the mask is never correctly populated. This went unnoticed so far because by chance XEN/PV ends up with smp_num_siblings == 2. So smt_hotplug_control stays at its default value CPU_SMT_ENABLED and the primary thread mask is never evaluated in the context of CPU hotplug. This stopped "working" with the upcoming overhaul of the topology evaluation which legitimately provides a fake topology for XEN/PV. That sets smp_num_siblings to 1, which causes the core CPU hot-plug core to refuse to bring up the APs. This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED which causes cpu_smt_allowed() to evaluate the unpopulated primary thread mask with the conclusion that all non-boot CPUs are not valid to be plugged. Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and CPU_SMT_NOT_IMPLEMENTED into account. Rename it to cpu_bootable() while at it as that makes it more clear what the function is about. The primary mask issue on x86 XEN/PV needs to be addressed separately as there are users outside of the CPU hotplug code too. Fixes: 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT") Reported-by: Juergen Gross Signed-off-by: Thomas Gleixner Tested-by: Juergen Gross Tested-by: Sohil Mehta Tested-by: Michael Kelley Tested-by: Peter Zijlstra (Intel) Tested-by: Zhang Rui Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20230814085112.149440843@linutronix.de Signed-off-by: Sasha Levin

cpu/hotplug: Prevent self deadlock on CPU hot-unplug

2023-08-30T10:24:22Z

Xiongfeng reported and debugged a self deadlock of the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. CPU1 CPU2 T1 sets cfs_quota starts hrtimer cfs_bandwidth 'period_timer' T1 is migrated to CPU2 T1 initiates offlining of CPU1 Hotplug operation starts ... 'period_timer' expires and is re-enqueued on CPU1 ... take_cpu_down() CPU1 shuts down and does not handle timers anymore. They have to be migrated in the post dead hotplug steps by the control task. T1 runs the post dead offline operation T1 is scheduled out T1 waits for 'period_timer' to expire T1 waits there forever if it is scheduled out before it can execute the hrtimer offline callback hrtimers_dead_cpu(). Cure this by delegating the hotplug control operation to a worker thread on an online CPU. This takes the initiating user space task, which might be affected by the bandwidth timer, completely out of the picture. Reported-by: Xiongfeng Wang Signed-off-by: Thomas Gleixner Tested-by: Yu Liao Acked-by: Vincent Guittot Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx

cpu/SMT: Fix cpu_smt_possible() comment

2023-07-31T15:32:44Z

Commit e1572f1d08be ("cpu/SMT: create and export cpu_smt_possible()") introduces cpu_smt_possible() to represent if SMT is theoretically possible. It returns true when SMT is supported and not forcefully disabled ('nosmt=force'). But the comment of it says "Returns true if SMT is not supported of forcefully (irreversibly) disabled", which is wrong. Fix that comment accordingly. Signed-off-by: Zhang Rui Signed-off-by: Thomas Gleixner Reviewed-by: Vitaly Kuznetsov Link: https://lore.kernel.org/r/20230728155313.44170-1-rui.zhang@intel.com

cpu/SMT: Allow enabling partial SMT states via sysfs

2023-07-28T07:53:37Z

Add support to the /sys/devices/system/cpu/smt/control interface for enabling a specified number of SMT threads per core, including partial SMT states where not all threads are brought online. The current interface accepts "on" and "off", to enable either 1 or all SMT threads per core. This commit allows writing an integer, between 1 and the number of SMT threads supported by the machine. Writing 1 is a synonym for "off", 2 or more enables SMT with the specified number of threads. When reading the file, if all threads are online "on" is returned, to avoid changing behaviour for existing users. If some other number of threads is online then the integer value is returned. Architectures like x86 only supporting 1 thread or all threads, should not define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT states, like PowerPC, should define it. [ ldufour: Slightly reword the commit's description ] [ ldufour: Remove switch() in __store_smt_control() ] [ ldufour: Rix build issue in control_show() ] Reported-by: kernel test robot Signed-off-by: Michael Ellerman Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-8-ldufour@linux.ibm.com

cpu/SMT: Create topology_smt_thread_allowed()

2023-07-28T07:53:37Z

Some architectures allows partial SMT states, i.e. when not all SMT threads are brought online. To support that, add an architecture helper which checks whether a given CPU is allowed to be brought online depending on how many SMT threads are currently enabled. Since this is only applicable to architecture supporting partial SMT, only these architectures should select the new configuration variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not supporting the partial SMT states, there is no need to define topology_cpu_smt_allowed(), the generic code assumed that all the threads are allowed or only the primary ones. Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is enabled, to check if the particular thread should be onlined. Notably, also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow offlining some threads to move from a higher to lower number of threads online. [ ldufour: Slightly reword the commit's description ] [ ldufour: Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC ] Suggested-by: Thomas Gleixner Signed-off-by: Michael Ellerman Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-7-ldufour@linux.ibm.com

cpu/SMT: Remove topology_smt_supported()

2023-07-28T07:53:37Z

Since the maximum number of threads is now passed to cpu_smt_set_num_threads(), checking that value is enough to know whether SMT is supported. Suggested-by: Thomas Gleixner Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-6-ldufour@linux.ibm.com

cpu/SMT: Store the current/max number of threads

2023-07-28T07:53:37Z

Some architectures allow partial SMT states at boot time, ie. when not all SMT threads are brought online. To support that the SMT code needs to know the maximum number of SMT threads, and also the currently configured number. The architecture code knows the max number of threads, so have the architecture code pass that value to cpu_smt_set_num_threads(). Note that although topology_max_smt_threads() exists, it is not configured early enough to be used here. As architecture, like PowerPC, allows the threads number to be set through the kernel command line, also pass that value. [ ldufour: Slightly reword the commit message ] [ ldufour: Rename cpu_smt_check_topology and add a num_threads argument ] Signed-off-by: Michael Ellerman Signed-off-by: Laurent Dufour Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-5-ldufour@linux.ibm.com

cpu/SMT: Move smt/control simple exit cases earlier

2023-07-28T07:53:36Z

Move the simple exit cases, i.e. those which don't depend on the value written, earlier in the function. That makes it clearer that regardless of the input those states cannot be transitioned out of. That does have a user-visible effect, in that the error returned will now always be EPERM/ENODEV for those states, regardless of the value written. Previously writing an invalid value would return EINVAL even when in those states. Signed-off-by: Michael Ellerman Signed-off-by: Thomas Gleixner Tested-by: Zhang Rui Link: https://lore.kernel.org/r/20230705145143.40545-4-ldufour@linux.ibm.com