<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/cpu.c, branch v6.7.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.7.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.7.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-11-19T21:35:07Z</updated>
<entry>
<title>Merge tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-11-19T21:35:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-19T21:35:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b0014556a2a1991df08b2b5d586a1bcc9e762ffd'/>
<id>urn:sha1:b0014556a2a1991df08b2b5d586a1bcc9e762ffd</id>
<content type='text'>
Pull timer fix from Borislav Petkov:

 - Do the push of pending hrtimers away from a CPU which is being
   offlined earlier in the offlining process in order to prevent a
   deadlock

* tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimers: Push pending hrtimers away from outgoing CPU earlier
</content>
</entry>
<entry>
<title>hrtimers: Push pending hrtimers away from outgoing CPU earlier</title>
<updated>2023-11-11T17:06:42Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-11-07T14:57:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94'/>
<id>urn:sha1:5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94</id>
<content type='text'>
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.

Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.

Reported-by: Yu Liao &lt;liaoyu15@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Liu Tie &lt;liutie4@huawei.com&gt;
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
</content>
</entry>
<entry>
<title>Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic</title>
<updated>2023-11-02T01:28:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-02T01:28:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1e0c505e13162a2abe7c984309cfe2ae976b428d'/>
<id>urn:sha1:1e0c505e13162a2abe7c984309cfe2ae976b428d</id>
<content type='text'>
Pull ia64 removal and asm-generic updates from Arnd Bergmann:

 - The ia64 architecture gets its well-earned retirement as planned,
   now that there is one last (mostly) working release that will be
   maintained as an LTS kernel.

 - The architecture specific system call tables are updated for the
   added map_shadow_stack() syscall and to remove references to the
   long-gone sys_lookup_dcookie() syscall.

* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  hexagon: Remove unusable symbols from the ptrace.h uapi
  asm-generic: Fix spelling of architecture
  arch: Reserve map_shadow_stack() syscall number for all architectures
  syscalls: Cleanup references to sys_lookup_dcookie()
  Documentation: Drop or replace remaining mentions of IA64
  lib/raid6: Drop IA64 support
  Documentation: Drop IA64 from feature descriptions
  kernel: Drop IA64 support from sig_fault handlers
  arch: Remove Itanium (IA-64) architecture
</content>
</entry>
<entry>
<title>Merge tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks</title>
<updated>2023-10-31T04:01:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-31T04:01:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2656821f1f202d58224551b71eff41aafd1edf8b'/>
<id>urn:sha1:2656821f1f202d58224551b71eff41aafd1edf8b</id>
<content type='text'>
Pull RCU updates from Frederic Weisbecker:

 - RCU torture, locktorture and generic torture infrastructure updates
   that include various fixes, cleanups and consolidations.

   Among the user visible things, ftrace dumps can now be found into
   their own file, and module parameters get better documented and
   reported on dumps.

 - Generic and misc fixes all over the place. Some highlights:

     * Hotplug handling has seen some light cleanups and comments

     * An RCU barrier can now be triggered through sysfs to serialize
       memory stress testing and avoid OOM

     * Object information is now dumped in case of invalid callback
       invocation

     * Also various SRCU issues, too hard to trigger to deserve urgent
       pull requests, have been fixed

 - RCU documentation updates

 - RCU reference scalability test minor fixes and doc improvements.

 - RCU tasks minor fixes

 - Stall detection updates. Introduce RCU CPU Stall notifiers that
   allows a subsystem to provide informations to help debugging. Also
   cure some false positive stalls.

* tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits)
  srcu: Only accelerate on enqueue time
  locktorture: Check the correct variable for allocation failure
  srcu: Fix callbacks acceleration mishandling
  rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  rcu: Standardize explicit CPU-hotplug calls
  rcu: Conditionally build CPU-hotplug teardown callbacks
  rcu: Remove references to rcu_migrate_callbacks() from diagrams
  rcu: Assume rcu_report_dead() is always called locally
  rcu: Assume IRQS disabled from rcu_report_dead()
  rcu: Use rcu_segcblist_segempty() instead of open coding it
  rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
  srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  torture: Convert parse-console.sh to mktemp
  rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
  rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
  torture: Add kvm.sh --debug-info argument
  locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
  doc: Catch-up update for locktorture module parameters
  locktorture: Add call_rcu_chains module parameter
  locktorture: Add new module parameters to lock_torture_print_module_parms()
  ...
</content>
</entry>
<entry>
<title>Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-10-31T03:37:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-31T03:37:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eb55307e6716b1a02f7db05e27d60e8ca2289c03'/>
<id>urn:sha1:eb55307e6716b1a02f7db05e27d60e8ca2289c03</id>
<content type='text'>
Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
</content>
</entry>
<entry>
<title>cpu/hotplug: Don't offline the last non-isolated CPU</title>
<updated>2023-10-17T19:41:33Z</updated>
<author>
<name>Ran Xiaokai</name>
<email>ran.xiaokai@zte.com.cn</email>
</author>
<published>2023-10-17T09:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=38685e2a0476127db766f81b1c06019ddc4c9ffa'/>
<id>urn:sha1:38685e2a0476127db766f81b1c06019ddc4c9ffa</id>
<content type='text'>
If a system has isolated CPUs via the "isolcpus=" command line parameter,
then an attempt to offline the last housekeeping CPU will result in a
WARN_ON() when rebuilding the scheduler domains and a subsequent panic due
to and unhandled empty CPU mas in partition_sched_domains_locked().

cpuset_hotplug_workfn()
  rebuild_sched_domains_locked()
    ndoms = generate_sched_domains(&amp;doms, &amp;attr);
      cpumask_and(doms[0], top_cpuset.effective_cpus, housekeeping_cpumask(HK_FLAG_DOMAIN));

Thus results in an empty CPU mask which triggers the warning and then the
subsequent crash:

WARNING: CPU: 4 PID: 80 at kernel/sched/topology.c:2366 build_sched_domains+0x120c/0x1408
Call trace:
 build_sched_domains+0x120c/0x1408
 partition_sched_domains_locked+0x234/0x880
 rebuild_sched_domains_locked+0x37c/0x798
 rebuild_sched_domains+0x30/0x58
 cpuset_hotplug_workfn+0x2a8/0x930

Unable to handle kernel paging request at virtual address fffe80027ab37080
 partition_sched_domains_locked+0x318/0x880
 rebuild_sched_domains_locked+0x37c/0x798

Aside of the resulting crash, it does not make any sense to offline the last
last housekeeping CPU.

Prevent this by masking out the non-housekeeping CPUs when selecting a
target CPU for initiating the CPU unplug operation via the work queue.

Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/202310171709530660462@zte.com.cn
</content>
</entry>
<entry>
<title>cpu/SMT: Make SMT control more robust against enumeration failures</title>
<updated>2023-10-10T12:38:17Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-08-14T08:18:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d91bdd96b55cc3ce98d883a60f133713821b80a6'/>
<id>urn:sha1:d91bdd96b55cc3ce98d883a60f133713821b80a6</id>
<content type='text'>
The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.

This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.

This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So smt_hotplug_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.

This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.

This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED
which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
mask with the conclusion that all non-boot CPUs are not valid to be
plugged.

Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and
CPU_SMT_NOT_IMPLEMENTED into account. Rename it to cpu_bootable() while at
it as that makes it more clear what the function is about.

The primary mask issue on x86 XEN/PV needs to be addressed separately as
there are users outside of the CPU hotplug code too.

Fixes: 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT")
Reported-by: Juergen Gross &lt;jgross@suse.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Juergen Gross &lt;jgross@suse.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Tested-by: Michael Kelley &lt;mikelley@microsoft.com&gt;
Tested-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230814085112.149440843@linutronix.de

</content>
</entry>
<entry>
<title>rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP</title>
<updated>2023-10-04T20:38:35Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2023-09-08T20:36:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a28ab03b499601c14b8f502d875c2bee23209659'/>
<id>urn:sha1:a28ab03b499601c14b8f502d875c2bee23209659</id>
<content type='text'>
The callbacks migration is performed through an explicit call from
the hotplug control CPU right after the death of the target CPU and
before proceeding with the CPUHP_ teardown functions.

This is unusual but necessary and yet uncommented. Summarize the reason
as explained in the changelog of:

	a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline)

Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Standardize explicit CPU-hotplug calls</title>
<updated>2023-10-04T20:29:45Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2023-09-08T20:36:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=448e9f34d91d1a4799fdb06a93c2c24b34b6fd9d'/>
<id>urn:sha1:448e9f34d91d1a4799fdb06a93c2c24b34b6fd9d</id>
<content type='text'>
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
functions.

Also rcu_cpu_starting() and rcu_report_dead() have different naming
conventions while they mirror each other's effects.

Fix the headers and propose a naming that relates both functions and
aligns with the prefix of other rcutree CPU-hotplug functions.

Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Assume rcu_report_dead() is always called locally</title>
<updated>2023-10-04T15:35:56Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2023-09-08T20:35:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c964c1f5ee96e1460606d44f80a47bdacd8fe568'/>
<id>urn:sha1:c964c1f5ee96e1460606d44f80a47bdacd8fe568</id>
<content type='text'>
rcu_report_dead() has to be called locally by the CPU that is going to
exit the RCU state machine. Passing a cpu argument here is error-prone
and leaves the possibility for a racy remote call.

Use local access instead.

Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</content>
</entry>
</feed>
