summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2025-03-18locking: Move MCS struct definition to public headerKumar Kartikeya Dwivedi
Move the definition of the struct mcs_spinlock from the private mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h asm-generic header, since we will need to reference it from the qspinlock.h header in subsequent commits. Reviewed-by: Barret Rhoden <brho@google.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-16percpu: use TYPEOF_UNQUAL() in variable declarationsUros Bizjak
Use TYPEOF_UNQUAL() to declare variables as a corresponding type without named address space qualifier to avoid "`__seg_gs' specified for auto variable `var'" errors. Link: https://lkml.kernel.org/r/20250127160709.80604-4-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Acked-by: Nadav Amit <nadav.amit@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-08locking/lockdep: Add kasan_check_byte() check in lock_acquire()Waiman Long
KASAN instrumentation of lockdep has been disabled, as we don't need KASAN to check the validity of lockdep internal data structures and incur unnecessary performance overhead. However, the lockdep_map pointer passed in externally may not be valid (e.g. use-after-free) and we run the risk of using garbage data resulting in false lockdep reports. Add kasan_check_byte() call in lock_acquire() for non kernel core data object to catch invalid lockdep_map and print out a KASAN report before any lockdep splat, if any. Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20250214195242.2480920-1-longman@redhat.com Link: https://lore.kernel.org/r/20250307232717.1759087-7-boqun.feng@gmail.com
2025-03-08locking/lockdep: Disable KASAN instrumentation of lockdep.cWaiman Long
Both KASAN and LOCKDEP are commonly enabled in building a debug kernel. Each of them can significantly slow down the speed of a debug kernel. Enabling KASAN instrumentation of the LOCKDEP code will further slow things down. Since LOCKDEP is a high overhead debugging tool, it will never get enabled in a production kernel. The LOCKDEP code is also pretty mature and is unlikely to get major changes. There is also a possibility of recursion similar to KCSAN. To evaluate the performance impact of disabling KASAN instrumentation of lockdep.c, the time to do a parallel build of the Linux defconfig kernel was used as the benchmark. Two x86-64 systems (Skylake & Zen 2) and an arm64 system were used as test beds. Two sets of non-RT and RT kernels with similar configurations except mainly CONFIG_PREEMPT_RT were used for evaluation. For the Skylake system: Kernel Run time Sys time ------ -------- -------- Non-debug kernel (baseline) 0m47.642s 4m19.811s [CONFIG_KASAN_INLINE=y] Debug kernel 2m11.108s (x2.8) 38m20.467s (x8.9) Debug kernel (patched) 1m49.602s (x2.3) 31m28.501s (x7.3) Debug kernel (patched + mitigations=off) 1m30.988s (x1.9) 26m41.993s (x6.2) RT kernel (baseline) 0m54.871s 7m15.340s [CONFIG_KASAN_INLINE=n] RT debug kernel 6m07.151s (x6.7) 135m47.428s (x18.7) RT debug kernel (patched) 3m42.434s (x4.1) 74m51.636s (x10.3) RT debug kernel (patched + mitigations=off) 2m40.383s (x2.9) 57m54.369s (x8.0) [CONFIG_KASAN_INLINE=y] RT debug kernel 3m22.155s (x3.7) 77m53.018s (x10.7) RT debug kernel (patched) 2m36.700s (x2.9) 54m31.195s (x7.5) RT debug kernel (patched + mitigations=off) 2m06.110s (x2.3) 45m49.493s (x6.3) For the Zen 2 system: Kernel Run time Sys time ------ -------- -------- Non-debug kernel (baseline) 1m42.806s 39m48.714s [CONFIG_KASAN_INLINE=y] Debug kernel 4m04.524s (x2.4) 125m35.904s (x3.2) Debug kernel (patched) 3m56.241s (x2.3) 127m22.378s (x3.2) Debug kernel (patched + mitigations=off) 2m38.157s (x1.5) 92m35.680s (x2.3) RT kernel (baseline) 1m51.500s 14m56.322s [CONFIG_KASAN_INLINE=n] RT debug kernel 16m04.962s (x8.7) 244m36.463s (x16.4) RT debug kernel (patched) 9m09.073s (x4.9) 129m28.439s (x8.7) RT debug kernel (patched + mitigations=off) 3m31.662s (x1.9) 51m01.391s (x3.4) For the arm64 system: Kernel Run time Sys time ------ -------- -------- Non-debug kernel (baseline) 1m56.844s 8m47.150s Debug kernel 3m54.774s (x2.0) 92m30.098s (x10.5) Debug kernel (patched) 3m32.429s (x1.8) 77m40.779s (x8.8) RT kernel (baseline) 4m01.641s 18m16.777s [CONFIG_KASAN_INLINE=n] RT debug kernel 19m32.977s (x4.9) 304m23.965s (x16.7) RT debug kernel (patched) 16m28.354s (x4.1) 234m18.149s (x12.8) Turning the mitigations off doesn't seems to have any noticeable impact on the performance of the arm64 system. So the mitigation=off entries aren't included. For the x86 CPUs, CPU mitigations has a much bigger impact on performance, especially the RT debug kernel with CONFIG_KASAN_INLINE=n. The SRSO mitigation in Zen 2 has an especially big impact on the debug kernel. It is also the majority of the slowdown with mitigations on. It is because the patched RET instruction slows down function returns. A lot of helper functions that are normally compiled out or inlined may become real function calls in the debug kernel. With !CONFIG_KASAN_INLINE, the KASAN instrumentation inserts a lot of __asan_loadX*() and __kasan_check_read() function calls to memory access portion of the code. The lockdep's __lock_acquire() function, for instance, has 66 __asan_loadX*() and 6 __kasan_check_read() calls added with KASAN instrumentation. Of course, the actual numbers may vary depending on the compiler used and the exact version of the lockdep code. With the Skylake test system, the parallel kernel build times reduction of the RT debug kernel with this patch are: CONFIG_KASAN_INLINE=n: -37% CONFIG_KASAN_INLINE=y: -22% The time reduction is less with CONFIG_KASAN_INLINE=y, but it is still significant. Setting CONFIG_KASAN_INLINE=y can result in a significant performance improvement. The major drawback is a significant increase in the size of kernel text. In the case of vmlinux, its text size increases from 45997948 to 67606807. That is a 47% size increase (about 21 Mbytes). The size increase of other kernel modules should be similar. With the newly added rtmutex and lockdep lock events, the relevant event counts for the test runs with the Skylake system were: Event type Debug kernel RT debug kernel ---------- ------------ --------------- lockdep_acquire 1,968,663,277 5,425,313,953 rtlock_slowlock - 401,701,156 rtmutex_slowlock - 139,672 The __lock_acquire() calls in the RT debug kernel are x2.8 times of the non-RT debug kernel with the same workload. Since the __lock_acquire() function is a big hitter in term of performance slowdown, this makes the RT debug kernel much slower than the non-RT one. The average lock nesting depth is likely to be higher in the RT debug kernel too leading to longer execution time in the __lock_acquire() function. As the small advantage of enabling KASAN instrumentation to catch potential memory access error in the lockdep debugging tool is probably not worth the drawback of further slowing down a debug kernel, disable KASAN instrumentation in the lockdep code to allow the debug kernels to regain some performance back, especially for the RT debug kernels. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250307232717.1759087-6-boqun.feng@gmail.com
2025-03-08locking/lock_events: Add locking events for lockdepWaiman Long
Add some lock events to lockdep to profile its behavior. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250307232717.1759087-5-boqun.feng@gmail.com
2025-03-08locking/lock_events: Add locking events for rtmutex slow pathsWaiman Long
Add locking events for rtlock_slowlock() and rt_mutex_slowlock() for profiling the slow path behavior of rt_spin_lock() and rt_mutex_lock(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250307232717.1759087-4-boqun.feng@gmail.com
2025-03-08Merge branch 'locking/urgent' into locking/core, to pick up locking fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-08locking/semaphore: Use wake_q to wake up processes outside lock critical sectionWaiman Long
A circular lock dependency splat has been seen involving down_trylock(): ====================================================== WARNING: possible circular locking dependency detected 6.12.0-41.el10.s390x+debug ------------------------------------------------------ dd/32479 is trying to acquire lock: 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90 but task is already holding lock: 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0 the existing dependency chain (in reverse order) is: -> #4 (&zone->lock){-.-.}-{2:2}: -> #3 (hrtimer_bases.lock){-.-.}-{2:2}: -> #2 (&rq->__lock){-.-.}-{2:2}: -> #1 (&p->pi_lock){-.-.}-{2:2}: -> #0 ((console_sem).lock){-.-.}-{2:2}: The console_sem -> pi_lock dependency is due to calling try_to_wake_up() while holding the console_sem raw_spinlock. This dependency can be broken by using wake_q to do the wakeup instead of calling try_to_wake_up() under the console_sem lock. This will also make the semaphore's raw_spinlock become a terminal lock without taking any further locks underneath it. The hrtimer_bases.lock is a raw_spinlock while zone->lock is a spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via the debug_objects_fill_pool() helper function in the debugobjects code. -> #4 (&zone->lock){-.-.}-{2:2}: __lock_acquire+0xe86/0x1cc0 lock_acquire.part.0+0x258/0x630 lock_acquire+0xb8/0xe0 _raw_spin_lock_irqsave+0xb4/0x120 rmqueue_bulk+0xac/0x8f0 __rmqueue_pcplist+0x580/0x830 rmqueue_pcplist+0xfc/0x470 rmqueue.isra.0+0xdec/0x11b0 get_page_from_freelist+0x2ee/0xeb0 __alloc_pages_noprof+0x2c2/0x520 alloc_pages_mpol_noprof+0x1fc/0x4d0 alloc_pages_noprof+0x8c/0xe0 allocate_slab+0x320/0x460 ___slab_alloc+0xa58/0x12b0 __slab_alloc.isra.0+0x42/0x60 kmem_cache_alloc_noprof+0x304/0x350 fill_pool+0xf6/0x450 debug_object_activate+0xfe/0x360 enqueue_hrtimer+0x34/0x190 __run_hrtimer+0x3c8/0x4c0 __hrtimer_run_queues+0x1b2/0x260 hrtimer_interrupt+0x316/0x760 do_IRQ+0x9a/0xe0 do_irq_async+0xf6/0x160 Normally a raw_spinlock to spinlock dependency is not legitimate and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled, but debug_objects_fill_pool() is an exception as it explicitly allows this dependency for non-PREEMPT_RT kernel without causing PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is legitimate and not a bug. Anyway, semaphore is the only locking primitive left that is still using try_to_wake_up() to do wakeup inside critical section, all the other locking primitives had been migrated to use wake_q to do wakeup outside of the critical section. It is also possible that there are other circular locking dependencies involving printk/console_sem or other existing/new semaphores lurking somewhere which may show up in the future. Let just do the migration now to wake_q to avoid headache like this. Reported-by: yzbot+ed801a886dfdbfe7136d@syzkaller.appspotmail.com Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
2025-03-08locking/rtmutex: Use the 'struct' keyword in kernel-doc commentRandy Dunlap
Add the "struct" keyword to prevent a kernel-doc warning: rtmutex_common.h:67: warning: cannot understand function prototype: 'struct rt_wake_q_head ' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20250307232717.1759087-2-boqun.feng@gmail.com
2025-02-21locking/mutex: Add MUTEX_WARN_ON() into fast pathYunhui Cui
Scenario: In platform_device_register, the driver misuses struct device as platform_data, making kmemdup duplicate a device. Accessing the duplicate may cause list corruption due to its mutex magic or list holding old content. It recurs randomly as the first mutex - getting process skips the slow path and mutex check. Adding MUTEX_WARN_ON(lock->magic!= lock) in __mutex_trylock_fast() makes it always happen. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250126033243.53069-1-cuiyunhui@bytedance.com
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-21Merge tag 'rcu.release.v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Uladzislau Rezki: "Misc fixes: - check if IRQs are disabled in rcu_exp_need_qs() - instrument KCSAN exclusive-writer assertions - add extra WARN_ON_ONCE() check - set the cpu_no_qs.b.exp under lock - warn if callback enqueued on offline CPU Torture-test updates: - add rcutorture.preempt_duration kernel module parameter - make the TREE03 scenario do preemption - improve pooling timeouts for rcu_torture_writer() - improve output of "Failure/close-call rcutorture reader segments" - add some reader-state debugging checks - update doc of polled APIs - add extra diagnostics for per-reader-segment preemption - add an extra test for sched_clock() - improve testing on unresponsive systems SRCU updates: - improve doc for srcu_read_lock() in terms of return value - fix typo in comments - remove redundant GP sequence checks in the srcu_funnel_gp_start" * tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits) srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/ srcu: Guarantee non-negative return value from srcu_read_lock() MAINTAINERS: Update RCU git tree rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs() rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp rcu: Make preemptible rcu_exp_handler() check idempotency rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock rcu: Make rcu_report_exp_cpu_mult() caller acquire lock rcu: Report callbacks enqueued on offline CPU blind spot rcutorture: Use symbols for SRCU reader flavors rcutorture: Add per-reader-segment preemption diagnostics rcutorture: Read CPU ID for decoration protected by both reader types rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics rcutorture: Add parameters to control polled/conditional wait interval rcutorture: Add documentation for recent conditional and polled APIs rcutorture: Ignore attempts to test preemption and forward progress rcutorture: Make rcutorture_one_extend() check reader state rcutorture: Pretty-print rcutorture reader segments ...
2024-12-22Merge tag 'lockdep-for-tip.20241220' of ↵Peter Zijlstra
git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux into locking/core Lockdep changes for v6.14: - Use swap() macro in the ww_mutex test. - Minor fixes and documentation for lockdep configs on internal data structure sizes. - Some "-Wunused-function" warning fixes for Clang. Rust locking changes for v6.14: - Add Rust locking files into LOCKING PRIMITIVES maintainer entry. - Add `Lock<(), ..>::from_raw()` function to support abstraction on low level locking. - Expose `Guard::new()` for public usage and add type alias for spinlock and mutex guards. - Add lockdep checking when creating a new lock `Guard`.
2024-12-20sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabledJohn Stultz
A common pattern seen when wake_qs are used to defer a wakeup until after a lock is released is something like: preempt_disable(); raw_spin_unlock(lock); wake_up_q(wake_q); preempt_enable(); So create some raw_spin_unlock*_wake() helper functions to clean this up. Applies on top of the fix I submitted here: https://lore.kernel.org/lkml/20241212222138.2400498-1-jstultz@google.com/ NOTE: I recognise the unlock()/unlock_irq()/unlock_irqrestore() variants creates its own duplication, which we could use a macro to generate the similar functions, but I often dislike how those generation macros making finding the actual implementation harder, so I left the three functions as is. If folks would prefer otherwise, let me know and I'll switch it. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241217040803.243420-1-jstultz@google.com
2024-12-19lockdep: Move lockdep_assert_locked() under #ifdef CONFIG_PROVE_LOCKINGAndy Shevchenko
When lockdep_assert_locked() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and CONFIG_PROVE_LOCKING=n: kernel/locking/lockdep.c:160:20: error: unused function 'lockdep_assert_locked' [-Werror,-Wunused-function] Fix this by moving it under the respective ifdeffery. See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). [Boqun: add more config information of the error] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20241202193445.769567-1-andriy.shevchenko@linux.intel.com
2024-12-19lockdep: Mark chain_hlock_class_idx() with __maybe_unusedAndy Shevchenko
When chain_hlock_class_idx() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y, CONFIG_LOCKDEP=y and CONFIG_PROVE_LOCKING=n: kernel/locking/lockdep.c:435:28: error: unused function 'chain_hlock_class_idx' [-Werror,-Wunused-function] Fix this by marking it with __maybe_unused. See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). [Boqun: add more config information of the error] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20241209170810.1485183-1-andriy.shevchenko@linux.intel.com
2024-12-17locking/rtmutex: Make sure we wake anything on the wake_q when we release ↵John Stultz
the lock->wait_lock Bert reported seeing occasional boot hangs when running with PREEPT_RT and bisected it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock"). It looks like I missed a few spots where we drop the wait_lock and potentially call into schedule without waking up the tasks on the wake_q structure. Since the tasks being woken are ww_mutex tasks they need to be able to run to release the mutex and unblock the task that currently is planning to wake them. Thus we can deadlock. So make sure we wake the wake_q tasks when we unlock the wait_lock. Closes: https://lore.kernel.org/lkml/20241211182502.2915-1-spasswolf@web.de Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock") Reported-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241212222138.2400498-1-jstultz@google.com
2024-12-15lockdep: Document MAX_LOCKDEP_CHAIN_HLOCKS calculationCarlos Llamas
Define a macro AVG_LOCKDEP_CHAIN_DEPTH to document the magic number '5' used in the calculation of MAX_LOCKDEP_CHAIN_HLOCKS. The number represents the estimated average depth (number of locks held) of a lock chain. The calculation of MAX_LOCKDEP_CHAIN_HLOCKS was first added in commit 443cd507ce7f ("lockdep: add lock_class information to lock_chain and output it"). Suggested-by: Waiman Long <longman@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20241024183631.643450-4-cmllamas@google.com
2024-12-15locking/ww_mutex/test: Use swap() macroThorsten Blum
Fixes the following Coccinelle/coccicheck warning reported by swap.cocci: WARNING opportunity for swap() Compile-tested only. [Boqun: Add the report tags from Jiapeng and Abaci Robot [1].] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11531 Link: https://lore.kernel.org/r/20241025081455.55089-1-jiapeng.chong@linux.alibaba.com [1] Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20240731135850.81018-2-thorsten.blum@toblux.com
2024-12-14torture: Add dowarn argument to torture_sched_setaffinity()Paul E. McKenney
Current use cases of torture_sched_setaffinity() are well served by its unconditional warning on error. However, an upcoming use case for a preemption kthread needs to avoid warnings that might otherwise arise when that kthread attempted to bind itself to a CPU on its way offline. This commit therefore adds a dowarn argument that, when false, suppresses the warning. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-02locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutexJohn Stultz
Anders had bisected a crash using PREEMPT_RT with linux-next and isolated it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock"), where it seemed the wake_q structure was somehow getting corrupted causing a null pointer traversal. I was able to easily repoduce this with PREEMPT_RT and managed to isolate down that through various call stacks we were actually calling wake_up_q() twice on the same wake_q. I found that in the problematic commit, I had added the wake_up_q() call in task_blocks_on_rt_mutex() around __ww_mutex_add_waiter(), following a similar pattern in __mutex_lock_common(). However, its just wrong. We haven't dropped the lock->wait_lock, so its contrary to the point of the original patch. And it didn't match the __mutex_lock_common() logic of re-initializing the wake_q after calling it midway in the stack. Looking at it now, the wake_up_q() call is incorrect and should just be removed. So drop the erronious logic I had added. Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock") Closes: https://lore.kernel.org/lkml/6afb936f-17c7-43fa-90e0-b9e780866097@app.fastmail.com/ Reported-by: Anders Roxell <anders.roxell@linaro.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20241114190051.552665-1-jstultz@google.com
2024-11-19Merge tag 'sched-core-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core facilities: - Add the "Lazy preemption" model (CONFIG_PREEMPT_LAZY=y), which optimizes fair-class preemption by delaying preemption requests to the tick boundary, while working as full preemption for RR/FIFO/DEADLINE classes. (Peter Zijlstra) - x86: Enable Lazy preemption (Peter Zijlstra) - riscv: Enable Lazy preemption (Jisheng Zhang) - Initialize idle tasks only once (Thomas Gleixner) - sched/ext: Remove sched_fork() hack (Thomas Gleixner) Fair scheduler: - Optimize the PLACE_LAG when se->vlag is zero (Huang Shijie) Idle loop: - Optimize the generic idle loop by removing unnecessary memory barrier (Zhongqiu Han) RSEQ: - Improve cache locality of RSEQ concurrency IDs for intermittent workloads (Mathieu Desnoyers) Waitqueues: - Make wake_up_{bit,var} less fragile (Neil Brown) PSI: - Pass enqueue/dequeue flags to psi callbacks directly (Johannes Weiner) Preparatory patches for proxy execution: - Add move_queued_task_locked helper (Connor O'Brien) - Consolidate pick_*_task to task_is_pushable helper (Connor O'Brien) - Split out __schedule() deactivate task logic into a helper (John Stultz) - Split scheduler and execution contexts (Peter Zijlstra) - Make mutex::wait_lock irq safe (Juri Lelli) - Expose __mutex_owner() (Juri Lelli) - Remove wakeups from under mutex::wait_lock (Peter Zijlstra) Misc fixes and cleanups: - Remove unused __HAVE_THREAD_FUNCTIONS hook support (David Disseldorp) - Update the comment for TIF_NEED_RESCHED_LAZY (Sebastian Andrzej Siewior) - Remove unused bit_wait_io_timeout (Dr. David Alan Gilbert) - remove the DOUBLE_TICK feature (Huang Shijie) - fix the comment for PREEMPT_SHORT (Huang Shijie) - Fix unnused variable warning (Christian Loehle) - No PREEMPT_RT=y for all{yes,mod}config" * tag 'sched-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched, x86: Update the comment for TIF_NEED_RESCHED_LAZY. sched: No PREEMPT_RT=y for all{yes,mod}config riscv: add PREEMPT_LAZY support sched, x86: Enable Lazy preemption sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT sched: Add Lazy preemption model sched: Add TIF_NEED_RESCHED_LAZY infrastructure sched/ext: Remove sched_fork() hack sched: Initialize idle tasks only once sched: psi: pass enqueue/dequeue flags to psi callbacks directly sched/uclamp: Fix unnused variable warning sched: Split scheduler and execution contexts sched: Split out __schedule() deactivate task logic into a helper sched: Consolidate pick_*_task to task_is_pushable helper sched: Add move_queued_task_locked helper locking/mutex: Expose __mutex_owner() locking/mutex: Make mutex::wait_lock irq safe locking/mutex: Remove wakeups from under mutex::wait_lock sched: Improve cache locality of RSEQ concurrency IDs for intermittent workloads sched: idle: Optimize the generic idle loop by removing needless memory barrier ...
2024-10-25locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()Uros Bizjak
Replace this pattern in osq_unlock(): atomic_cmpxchg(*ptr, old, new) == old ... with the simpler and faster: atomic_try_cmpxchg(*ptr, &old, new) The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after the CMPXCHG. The code in the fast path of osq_unlock() improves from: 11b: 31 c9 xor %ecx,%ecx 11d: 8d 50 01 lea 0x1(%rax),%edx 120: 89 d0 mov %edx,%eax 122: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) 126: 39 c2 cmp %eax,%edx 128: 75 05 jne 12f <...> to: 12b: 31 d2 xor %edx,%edx 12d: 83 c0 01 add $0x1,%eax 130: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 134: 75 05 jne 13b <...> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20241001114606.820277-1-ubizjak@gmail.com
2024-10-24locking/rtmutex: Fix misleading commentPeter Zijlstra
Going through the RCU-boost and rtmutex code, I ran into this utterly confusing comment. Fix it to avoid confusing future readers. [ tglx: Wordsmithed the comment ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/all/20241008092606.GJ33184@noisy.programming.kicks-ass.net
2024-10-24locking/rt: Annotate unlock followed by lock for sparse.Sebastian Andrzej Siewior
rt_mutex_slowlock_block() and rtlock_slowlock_locked() both unlock lock::wait_lock and then lock it later. This is unusual and sparse complains about it. Add __releases() + __acquires() annotation to mark that it is expected. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240812104200.2239232-5-bigeasy@linutronix.de
2024-10-24locking/rt: Add sparse annotation for RCU.Sebastian Andrzej Siewior
Every lock, that becomes a sleeping lock on PREEMPT_RT, starts a RCU read side critical section. There is no sparse annotation for this and sparse complains about unbalanced locking. Add __acquires/ __releases for the RCU lock. This covers all but the trylock functions. A __cond_acquires() annotation didn't work. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240812104200.2239232-4-bigeasy@linutronix.de
2024-10-17locking/pvqspinlock: Convert fields of 'enum vcpu_state' to uppercaseQiuxu Zhuo
Convert the fields of 'enum vcpu_state' to uppercase for better readability. No functional changes intended. Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20240809014802.15320-1-qiuxu.zhuo@intel.com
2024-10-17lockdep: Use info level for lockdep initial info messagesJiri Slaby (SUSE)
All those: Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 48 ... MAX_LOCKDEP_KEYS: 8192 and so on are dumped with the KERN_WARNING level. It is due to missing KERN_* annotation. Use pr_info() instead of bare printk() to dump the info with the info level. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20241007065457.20128-1-jirislaby@kernel.org
2024-10-17lockdep: Add lockdep_cleanup_dead_cpu()David Woodhouse
Add a function to check that an offline CPU has left the tracing infrastructure in a sane state. Commit 9bb69ba4c177 ("ACPI: processor_idle: use raw_safe_halt() in acpi_idle_play_dead()") fixed an issue where the acpi_idle_play_dead() function called safe_halt() instead of raw_safe_halt(), which had the side-effect of setting the hardirqs_enabled flag for the offline CPU. On x86 this triggered warnings from lockdep_assert_irqs_disabled() when the CPU was brought back online again later. These warnings were too early for the exception to be handled correctly, leading to a triple-fault. Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode, print the events leading up to it, and correct it so that the CPU can come online again correctly. Re-introducing the original bug now merely results in this warning instead: [ 61.556652] smpboot: CPU 1 is now offline [ 61.556769] CPU 1 left hardirqs enabled! [ 61.556915] irq event stamp: 128149 [ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70 [ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0 [ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423 [ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100 [boqun: Capitalize the title and reword the message a bit] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/f7bd2b3b999051bb3ef4be34526a9262008285f5.camel@infradead.org
2024-10-14locking/mutex: Expose __mutex_owner()Juri Lelli
Implementing proxy execution requires that scheduler code be able to identify the current owner of a mutex. Expose __mutex_owner() for this purpose (alone!). Includes a null mutex check, so that users of the function can be simplified. [Removed the EXPORT_SYMBOL] [jstultz: Reworked per Peter's suggestions] Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Metin Kaya <metin.kaya@arm.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Metin Kaya <metin.kaya@arm.com> Link: https://lore.kernel.org/r/20241009235352.1614323-4-jstultz@google.com
2024-10-14locking/mutex: Make mutex::wait_lock irq safeJuri Lelli
With the proxy-execution series, we traverse the task->mutex->task blocked_on/owner chain in the scheduler core. We do this while holding the rq::lock to keep the structures in place while taking and releasing the alternating lock types. Since the mutex::wait_lock is one of the locks we will take in this way under the rq::lock in the scheduler core, we need to make sure that its usage elsewhere is irq safe. [rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h] Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Metin Kaya <metin.kaya@arm.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Metin Kaya <metin.kaya@arm.com> Link: https://lore.kernel.org/r/20241009235352.1614323-3-jstultz@google.com
2024-10-14locking/mutex: Remove wakeups from under mutex::wait_lockPeter Zijlstra
In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it. Do this by utilizing wake_qs to defer the wakeup until after the lock is dropped. [Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Metin Kaya <metin.kaya@arm.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Metin Kaya <metin.kaya@arm.com> Link: https://lore.kernel.org/r/20241009235352.1614323-2-jstultz@google.com
2024-10-09locking/ww_mutex: Adjust to lockdep nest_lock requirementsThomas Hellström
When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the number of acquired lockdep_maps of mutexes of the same class, and also keeps a pointer to the first acquired lockdep_map of a class. That pointer is then used for various comparison-, printing- and checking purposes, but there is no mechanism to actively ensure that lockdep_map stays in memory. Instead, a warning is printed if the lockdep_map is freed and there are still held locks of the same lock class, even if the lockdep_map itself has been released. In the context of WW/WD transactions that means that if a user unlocks and frees a ww_mutex from within an ongoing ww transaction, and that mutex happens to be the first ww_mutex grabbed in the transaction, such a warning is printed and there might be a risk of a UAF. Note that this is only problem when lockdep is enabled and affects only dereferences of struct lockdep_map. Adjust to this by adding a fake lockdep_map to the acquired context and make sure it is the first acquired lockdep map of the associated ww_mutex class. Then hold it for the duration of the WW/WD transaction. This has the side effect that trying to lock a ww mutex *without* a ww_acquire_context but where a such context has been acquire, we'd see a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so modify that particular test to not acquire a ww_acquire_context if it is not going to be used. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel.com
2024-10-07locking/spinlocks: Make __raw_* lock ops staticGeert Uytterhoeven
If CONFIG_GENERIC_LOCKBREAK=y and CONFIG_DEBUG_LOCK_ALLOC=n (e.g. sh/sdk7786_defconfig): kernel/locking/spinlock.c:68:17: warning: no previous prototype for '__raw_spin_lock' [-Wmissing-prototypes] kernel/locking/spinlock.c:80:26: warning: no previous prototype for '__raw_spin_lock_irqsave' [-Wmissing-prototypes] kernel/locking/spinlock.c:98:17: warning: no previous prototype for '__raw_spin_lock_irq' [-Wmissing-prototypes] kernel/locking/spinlock.c:103:17: warning: no previous prototype for '__raw_spin_lock_bh' [-Wmissing-prototypes] kernel/locking/spinlock.c:68:17: warning: no previous prototype for '__raw_read_lock' [-Wmissing-prototypes] kernel/locking/spinlock.c:80:26: warning: no previous prototype for '__raw_read_lock_irqsave' [-Wmissing-prototypes] kernel/locking/spinlock.c:98:17: warning: no previous prototype for '__raw_read_lock_irq' [-Wmissing-prototypes] kernel/locking/spinlock.c:103:17: warning: no previous prototype for '__raw_read_lock_bh' [-Wmissing-prototypes] kernel/locking/spinlock.c:68:17: warning: no previous prototype for '__raw_write_lock' [-Wmissing-prototypes] kernel/locking/spinlock.c:80:26: warning: no previous prototype for '__raw_write_lock_irqsave' [-Wmissing-prototypes] kernel/locking/spinlock.c:98:17: warning: no previous prototype for '__raw_write_lock_irq' [-Wmissing-prototypes] kernel/locking/spinlock.c:103:17: warning: no previous prototype for '__raw_write_lock_bh' [-Wmissing-prototypes] All __raw_* lock ops are internal functions without external callers. Hence fix this by making them static. Note that if CONFIG_GENERIC_LOCKBREAK=y, no lock ops are inlined, as all of CONFIG_INLINE_*_LOCK* depend on !GENERIC_LOCKBREAK. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lkml.kernel.org/r/7201d7fb408375c6c4df541270d787b1b4a32354.1727879348.git.geert+renesas@glider.be
2024-09-29Merge tag 'locking-urgent-2024-09-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "lockdep: - Fix potential deadlock between lockdep and RCU (Zhiguo Niu) - Use str_plural() to address Coccinelle warning (Thorsten Blum) - Add debuggability enhancement (Luis Claudio R. Goncalves) static keys & calls: - Fix static_key_slow_dec() yet again (Peter Zijlstra) - Handle module init failure correctly in static_call_del_module() (Thomas Gleixner) - Replace pointless WARN_ON() in static_call_module_notify() (Thomas Gleixner) <linux/cleanup.h>: - Add usage and style documentation (Dan Williams) rwsems: - Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS (Waiman Long) atomic ops, x86: - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak) - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros Bizjak)" Signed-off-by: Ingo Molnar <mingo@kernel.org> * tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS jump_label: Fix static_key_slow_dec() yet again static_call: Replace pointless WARN_ON() in static_call_module_notify() static_call: Handle module init failure correctly in static_call_del_module() locking/lockdep: Simplify character output in seq_line() lockdep: fix deadlock issue between lockdep and rcu lockdep: Use str_plural() to fix Coccinelle warning cleanup: Add usage and style documentation lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
2024-09-29Merge branch 'locking/core' into locking/urgent, to pick up pending commitsIngo Molnar
Merge all pending locking commits into a single branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-21Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Many singleton patches - please see the various changelogs for details. Quite a lot of nilfs2 work this time around. Notable patch series in this pull request are: - "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64() to provide (much) more accurate results. The current implementation was causing Uwe some issues in the PWM drivers. - "xz: Updates to license, filters, and compression options" from Lasse Collin. Miscellaneous maintenance and kinor feature work to the xz decompressor. - "Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee. Fixes and enhancements to the gdb scripts. - "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this. - "nilfs2: add support for some common ioctls" from Ryusuke Konishi. Adds various commonly-available ioctls to nilfs2. - "This series fixes a number of formatting issues in kernel doc comments" from Ryusuke Konishi does that. - "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi. Fix issues where -ENOENT was being unintentionally and inappropriately returned to userspace. - "nilfs2: assorted cleanups" from Huang Xiaojia. - "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke Konishi fixes some issues which can occur on corrupted nilfs2 filesystems. - "scripts/decode_stacktrace.sh: improve error reporting and usability" from Luca Ceresoli does those things" * tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits) list: test: increase coverage of list_test_list_replace*() list: test: fix tests for list_cut_position() proc: use __auto_type more treewide: correct the typo 'retun' ocfs2: cleanup return value and mlog in ocfs2_global_read_info() nilfs2: remove duplicate 'unlikely()' usage nilfs2: fix potential oob read in nilfs_btree_check_delete() nilfs2: determine empty node blocks as corrupted nilfs2: fix potential null-ptr-deref in nilfs_btree_insert() user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation tools/mm: rm thp_swap_allocator_test when make clean squashfs: fix percpu address space issues in decompressor_multi_percpu.c lib: glob.c: added null check for character class nilfs2: refactor nilfs_segctor_thread() nilfs2: use kthread_create and kthread_stop for the log writer thread nilfs2: remove sc_timer_task nilfs2: do not repair reserved inode bitmap in nilfs_new_inode() nilfs2: eliminate the shared counter and spinlock for i_generation nilfs2: separate inode type information from i_state field nilfs2: use the BITS_PER_LONG macro ...
2024-09-19Merge tag 'sched-core-2024-09-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's last major contribution to the kernel: "SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers should be able to replace and improve that." (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef Esmat, Huang Shijie) - Preparatory changes for sched_ext integration: - Use set_next_task(.first) where required - Fix up set_next_task() implementations - Clean up DL server vs. core sched - Split up put_prev_task_balance() - Rework pick_next_task() - Combine the last put_prev_task() and the first set_next_task() - Rework dl_server - Add put_prev_task(.next) (Peter Zijlstra, with a fix by Tejun Heo) - Complete the EEVDF transition and refine EEVDF scheduling: - Implement delayed dequeue - Allow shorter slices to wakeup-preempt - Use sched_attr::sched_runtime to set request/slice suggestion - Document the new feature flags - Remove unused and duplicate-functionality fields - Simplify & unify pick_next_task_fair() - Misc debuggability enhancements (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin Schneider and Chuyi Zhou) - Initialize the vruntime of a new task when it is first enqueued, resulting in significant decrease in latency of newly woken tasks (Zhang Qiao) - Introduce SM_IDLE and an idle re-entry fast-path in __schedule() (K Prateek Nayak, Peter Zijlstra) - Clean up and clarify the usage of Clean up usage of rt_task() (Qais Yousef) - Preempt SCHED_IDLE entities in strict cgroup hierarchies (Tianchen Ding) - Clarify the documentation of time units for deadline scheduler parameters (Christian Loehle) - Remove the HZ_BW chicken-bit feature flag introduced a year ago, the original change seems to be working fine (Phil Auld) - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie, Peilin He, Qais Yousefm and Vincent Guittot) * tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/cpufreq: Use NSEC_PER_MSEC for deadline task cpufreq/cppc: Use NSEC_PER_MSEC for deadline task sched/deadline: Clarify nanoseconds in uapi sched/deadline: Convert schedtool example to chrt sched/debug: Fix the runnable tasks output sched: Fix sched_delayed vs sched_core kernel/sched: Fix util_est accounting for DELAY_DEQUEUE kthread: Fix task state in kthread worker if being frozen sched/pelt: Use rq_clock_task() for hw_pressure sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule() sched: Add put_prev_task(.next) sched: Rework dl_server sched: Combine the last put_prev_task() and the first set_next_task() sched: Rework pick_next_task() sched: Split up put_prev_task_balance() sched: Clean up DL server vs core sched sched: Fixup set_next_task() implementations sched: Use set_next_task(.first) where required sched/fair: Properly deactivate sched_delayed task upon class change ...
2024-09-17Merge tag 'printk-for-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: "This is the "last" part of the support for the new nbcon consoles. Where "nbcon" stays for "No Big console lock CONsoles" aka not under the console_lock. New callbacks are added to struct console: - write_thread() for flushing nbcon consoles in task context. - write_atomic() for flushing nbcon consoles in atomic context, including NMI. - con->device_lock() and device_unlock() for taking the driver specific lock, for example, port->lock. New printk-specific kthreads are created: - per-console kthreads which get responsible for flushing normal priority messages on nbcon consoles. - thread which gets responsible for flushing normal priority messages on all consoles when CONFIG_RT enabled. The new callbacks are called under a special per-console lock which has already been added back in v6.7. It allows to distinguish three severities: normal, emergency, and panic. A context with a higher priority could take over the ownership when it is safe even in the middle of handling a record. The panic context could do it even when it is not safe. But it is allowed only for the final desperate flush before entering the infinite loop. The new lock helps to flush the messages directly in emergency and panic contexts. But it is not enough in all situations: - console_lock() is still need for synchronization against boot consoles. - con->device_lock() is need for synchronization against other operations on the same HW, e.g. serial port speed setting, non-printk related read/write. The dependency on con->device_lock() is mutual. Any code taking the driver specific lock has to acquire the related nbcon console context as well. For example, see the new uart_port_lock() API. It provides the necessary synchronization against emergency and panic contexts where the messages are flushed only under the new per-console lock. Maybe surprisingly, a quite tricky part is the decision how to flush the consoles in various situations. It has to take into account: - message priority: normal, emergency, panic - scheduling context: task, atomic, deferred_legacy - registered consoles: boot, legacy, nbcon - threads are running: early boot, suspend, shutdown, panic - caller: printk(), pr_flush(), printk_flush_in_panic(), console_unlock(), console_start(), ... The primary decision is made in printk_get_console_flush_type(). It creates a hint what the caller should do: - flush nbcon consoles directly or via the kthread - call the legacy loop (console_unlock()) directly or via irq_work The existing behavior is preserved for the legacy consoles. The only exception is that they are not longer flushed directly from printk() in panic() before CPUs are stopped. But this blocking happens only when at least one nbcon console is registered. The motivation is to increase a chance to produce the crash dump. They legacy consoles might create a deadlock in compare with nbcon consoles. The nbcon console should allow to see the messages even when the crash dump fails. There are three possible ways how nbcon consoles are flushed: - The per-nbcon-console kthread is responsible for flushing messages added with the normal priority. This is the default mode. - The legacy loop, aka console_unlock(), is used when there is still a boot console registered. There is no easy way how to match an early console driver with a nbcon console driver. And the console_lock() provides the only reliable serialization at the moment. The legacy loop uses either con->write_atomic() or con->write_thread() callbacks depending on whether it is allowed to schedule. The atomic variant has to be used from printk(). - In other situations, the messages are flushed directly using write_atomic() which can be called in any context, including NMI. It is primary needed during early boot or shutdown, in emergency situations, and panic. The emergency priority is used by a code called within nbcon_cpu_emergency_enter()/exit(). At the moment, it is used in four situations: WARN(), Oops, lockdep, and RCU stall reports. Finally, there is no nbcon console at the moment. It means that the changes should _not_ modify the existing behavior. The only exception is CONFIG_RT which would force offloading the legacy loop, for normal priority context, into the dedicated kthread" * tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (54 commits) printk: Avoid false positive lockdep report for legacy printing printk: nbcon: Assign nice -20 for printing threads printk: Implement legacy printer kthread for PREEMPT_RT tty: sysfs: Add nbcon support for 'active' proc: Add nbcon support for /proc/consoles proc: consoles: Add notation to c_start/c_stop printk: nbcon: Show replay message on takeover printk: Provide helper for message prepending printk: nbcon: Rely on kthreads for normal operation printk: nbcon: Use thread callback if in task context for legacy printk: nbcon: Relocate nbcon_atomic_emit_one() printk: nbcon: Introduce printer kthreads printk: nbcon: Init @nbcon_seq to highest possible printk: nbcon: Add context to usable() and emit() printk: Flush console on unregister_console() printk: Fail pr_flush() if before SYSTEM_SCHEDULING printk: nbcon: Add function for printers to reacquire ownership printk: nbcon: Use raw_cpu_ptr() instead of open coding printk: Use the BITS_PER_LONG macro lockdep: Mark emergency sections in lockdep splats ...
2024-09-10locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under ↵Waiman Long
CONFIG_DEBUG_RWSEMS Both is_rwsem_reader_owned() and rwsem_owner() are currently only used when CONFIG_DEBUG_RWSEMS is defined. This causes a compilation error with clang when `make W=1` and CONFIG_WERROR=y: kernel/locking/rwsem.c:187:20: error: unused function 'is_rwsem_reader_owned' [-Werror,-Wunused-function] 187 | static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) | ^~~~~~~~~~~~~~~~~~~~~ kernel/locking/rwsem.c:271:35: error: unused function 'rwsem_owner' [-Werror,-Wunused-function] 271 | static inline struct task_struct *rwsem_owner(struct rw_semaphore *sem) | ^~~~~~~~~~~ Fix this by moving these two functions under the CONFIG_DEBUG_RWSEMS define. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240909182905.161156-1-longman@redhat.com
2024-09-01locking/ww_mutex/test: add MODULE_DESCRIPTION()Jeff Johnson
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/locking/test-ww_mutex.o Link: https://lkml.kernel.org/r/20240730-module_description_orphans-v1-5-7094088076c8@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Acked-by: Waiman Long <longman@redhat.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Alistar Popple <alistair@popple.id.au> Cc: Andrew Jeffery <andrew@codeconstruct.com.au> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eddie James <eajames@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Karol Herbst <karolherbst@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nouveau <nouveau@lists.freedesktop.org> Cc: Pekka Paalanen <ppaalanen@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-21lockdep: Mark emergency sections in lockdep splatsJohn Ogness
Mark emergency sections wherever multiple lines of lock debugging output are generated. In an emergency section, every printk() call will attempt to directly flush to the consoles using the EMERGENCY priority. Note that debug_show_all_locks() and lockdep_print_held_locks() rely on their callers to enter the emergency section. This is because these functions can also be called in non-emergency situations (such as sysrq). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-36-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-15rtmutex: Drop rt_mutex::wait_lock before schedulingRoland Xu
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the good case it returns with the lock held and in the deadlock case it emits a warning and goes into an endless scheduling loop with the lock held, which triggers the 'scheduling in atomic' warning. Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning and dropping into the schedule for ever loop. [ tglx: Moved unlock before the WARN(), removed the pointless comment, massaged changelog, added Fixes tag ] Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter") Signed-off-by: Roland Xu <mu001999@outlook.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300MB0635.AUSP300.PROD.OUTLOOK.COM
2024-08-08Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Assorted little stuff: - lockdep fixup for lockdep_set_notrack_class() - we can now remove a device when using erasure coding without deadlocking, though we still hit other issues - the 'allocator stuck' timeout is now configurable, and messages are ratelimited. The default timeout has been increased from 10 seconds to 30" * tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs: bcachefs: Use bch2_wait_on_allocator() in btree node alloc path bcachefs: Make allocator stuck timeout configurable, ratelimit messages bcachefs: Add missing path_traverse() to btree_iter_next_node() bcachefs: ec should not allocate from ro devs bcachefs: Improved allocator debugging for ec bcachefs: Add missing bch2_trans_begin() call bcachefs: Add a comment for bucket helper types bcachefs: Don't rely on implicit unsigned -> signed integer conversion lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT bcachefs: Fix double free of ca->buckets_nouse
2024-08-07sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()Qais Yousef
Some find the name realtime overloaded. Use rt_or_dl() as an alternative, hopefully better, name. Suggested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240610192018.1567075-4-qyousef@layalina.io
2024-08-07sched/rt: Clean up usage of rt_task()Qais Yousef
rt_task() checks if a task has RT priority. But depends on your dictionary, this could mean it belongs to RT class, or is a 'realtime' task, which includes RT and DL classes. Since this has caused some confusion already on discussion [1], it seemed a clean up is due. I define the usage of rt_task() to be tasks that belong to RT class. Make sure that it returns true only for RT class and audit the users and replace the ones required the old behavior with the new realtime_task() which returns true for RT and DL classes. Introduce similar realtime_prio() to create similar distinction to rt_prio() and update the users that required the old behavior to use the new function. Move MAX_DL_PRIO to prio.h so it can be used in the new definitions. Document the functions to make it more obvious what is the difference between them. PI-boosted tasks is a factor that must be taken into account when choosing which function to use. Rename task_is_realtime() to realtime_task_policy() as the old name is confusing against the new realtime_task(). No functional changes were intended. [1] https://lore.kernel.org/lkml/20240506100509.GL40213@noisy.programming.kicks-ass.net/ Signed-off-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240610192018.1567075-2-qyousef@layalina.io
2024-08-07lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STATKent Overstreet
We won't find a contended lock if it's not being tracked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-06locking/lockdep: Simplify character output in seq_line()Markus Elfring
Single characters should be put into a sequence. Thus use the corresponding function “seq_putc” for one selected call. This issue was transformed by using the Coccinelle software. Suggested-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/e346d688-7b01-462f-867c-ba52b7790d19@web.de
2024-08-06lockdep: fix deadlock issue between lockdep and rcuZhiguo Niu
There is a deadlock scenario between lockdep and rcu when rcu nocb feature is enabled, just as following call stack: rcuop/x -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?) -001|queued_spin_lock(inline) // try to hold nocb_gp_lock -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80) -002|__raw_spin_lock_irqsave(inline) -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80) -003|wake_nocb_gp_defer(inline) -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680) -004|__call_rcu_common(inline) -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?) -005|call_rcu_zapped(inline) -005|free_zapped_rcu(ch = ?)// hold graph lock -006|rcu_do_batch(rdp = 0xFFFFFF817F245680) -007|nocb_cb_wait(inline) -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680) -008|kthread(_create = 0xFFFFFF80803122C0) -009|ret_from_fork(asm) rcuop/y -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0) -001|queued_spin_lock() -001|lockdep_lock() -001|graph_lock() // try to hold graph lock -002|lookup_chain_cache_add() -002|validate_chain() -003|lock_acquire -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80) -005|lock_timer_base(inline) -006|mod_timer(inline) -006|wake_nocb_gp_defer(inline)// hold nocb_gp_lock -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680) -007|__call_rcu_common(inline) -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?) -008|call_rcu_hurry(inline) -008|rcu_sync_call(inline) -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58) -009|rcu_do_batch(rdp = 0xFFFFFF817F266680) -010|nocb_cb_wait(inline) -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680) -011|kthread(_create = 0xFFFFFF8080363740) -012|ret_from_fork(asm) rcuop/x and rcuop/y are rcu nocb threads with the same nocb gp thread. This patch release the graph lock before lockdep call_rcu. Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") Cc: stable@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Waiman Long <longman@redhat.com> Cc: Carlos Llamas <cmllamas@google.com> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20240620225436.3127927-1-cmllamas@google.com
2024-08-06lockdep: Use str_plural() to fix Coccinelle warningThorsten Blum
Fixes the following Coccinelle/coccicheck warning reported by string_choices.cocci: opportunity for str_plural(depth) Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20240528120008.403511-2-thorsten.blum@toblux.com