summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2023-03-10locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpathWaiman Long
commit b613c7f31476c44316bfac1af7cac714b7d6bef9 upstream. A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first RT waiter First waiter Lock holder ------------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter != first)) return false; Release wait_lock A non-first waiter cannot really acquire the rwsem even if it mistakenly believes that it can spin on OWNER_NULL value. If that waiter happens to be an RT task running on the same CPU as the first waiter, it can block the first waiter from acquiring the rwsem leading to live lock. Fix this problem by making sure that a non-first waiter cannot spin in the slowpath loop without sleeping. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230126003628.365092-2-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10locking/rwsem: Disable preemption in all down_read*() and up_read() code pathsWaiman Long
[ Upstream commit 3f5245538a1964ae186ab7e1636020a41aa63143 ] Commit: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") ... assumes that when the owner field is changed to NULL, the lock will become free soon. But commit: 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") ... disabled preemption when acquiring rwsem for write. However, preemption has not yet been disabled when acquiring a read lock on a rwsem. So a reader can add a RWSEM_READER_BIAS to count without setting owner to signal a reader, got preempted out by a RT task which then spins in the writer slowpath as owner remains NULL leading to live lock. One easy way to fix this problem is to disable preemption at all the down_read*() and up_read() code paths as implemented in this patch. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126003628.365092-3-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10locking/rwsem: Optimize down_read_trylock() under highly contended caseMuchun Song
[ Upstream commit 14c24048841151548a3f4d9e218510c844c1b737 ] We found that a process with 10 thousnads threads has been encountered a regression problem from Linux-v4.14 to Linux-v5.4. It is a kind of workload which will concurrently allocate lots of memory in different threads sometimes. In this case, we will see the down_read_trylock() with a high hotspot. Therefore, we suppose that rwsem has a regression at least since Linux-v5.4. In order to easily debug this problem, we write a simply benchmark to create the similar situation lile the following. ```c++ #include <sys/mman.h> #include <sys/time.h> #include <sys/resource.h> #include <sched.h> #include <cstdio> #include <cassert> #include <thread> #include <vector> #include <chrono> volatile int mutex; void trigger(int cpu, char* ptr, std::size_t sz) { cpu_set_t set; CPU_ZERO(&set); CPU_SET(cpu, &set); assert(pthread_setaffinity_np(pthread_self(), sizeof(set), &set) == 0); while (mutex); for (std::size_t i = 0; i < sz; i += 4096) { *ptr = '\0'; ptr += 4096; } } int main(int argc, char* argv[]) { std::size_t sz = 100; if (argc > 1) sz = atoi(argv[1]); auto nproc = std::thread::hardware_concurrency(); std::vector<std::thread> thr; sz <<= 30; auto* ptr = mmap(nullptr, sz, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); assert(ptr != MAP_FAILED); char* cptr = static_cast<char*>(ptr); auto run = sz / nproc; run = (run >> 12) << 12; mutex = 1; for (auto i = 0U; i < nproc; ++i) { thr.emplace_back(std::thread([i, cptr, run]() { trigger(i, cptr, run); })); cptr += run; } rusage usage_start; getrusage(RUSAGE_SELF, &usage_start); auto start = std::chrono::system_clock::now(); mutex = 0; for (auto& t : thr) t.join(); rusage usage_end; getrusage(RUSAGE_SELF, &usage_end); auto end = std::chrono::system_clock::now(); timeval utime; timeval stime; timersub(&usage_end.ru_utime, &usage_start.ru_utime, &utime); timersub(&usage_end.ru_stime, &usage_start.ru_stime, &stime); printf("usr: %ld.%06ld\n", utime.tv_sec, utime.tv_usec); printf("sys: %ld.%06ld\n", stime.tv_sec, stime.tv_usec); printf("real: %lu\n", std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()); return 0; } ``` The functionality of above program is simply which creates `nproc` threads and each of them are trying to touch memory (trigger page fault) on different CPU. Then we will see the similar profile by `perf top`. 25.55% [kernel] [k] down_read_trylock 14.78% [kernel] [k] handle_mm_fault 13.45% [kernel] [k] up_read 8.61% [kernel] [k] clear_page_erms 3.89% [kernel] [k] __do_page_fault The highest hot instruction, which accounts for about 92%, in down_read_trylock() is cmpxchg like the following. 91.89 │ lock cmpxchg %rdx,(%rdi) Sice the problem is found by migrating from Linux-v4.14 to Linux-v5.4, so we easily found that the commit ddb20d1d3aed ("locking/rwsem: Optimize down_read_trylock()") caused the regression. The reason is that the commit assumes the rwsem is not contended at all. But it is not always true for mmap lock which could be contended with thousands threads. So most threads almost need to run at least 2 times of "cmpxchg" to acquire the lock. The overhead of atomic operation is higher than non-atomic instructions, which caused the regression. By using the above benchmark, the real executing time on a x86-64 system before and after the patch were: Before Patch After Patch # of Threads real real reduced by ------------ ------ ------ ---------- 1 65,373 65,206 ~0.0% 4 15,467 15,378 ~0.5% 40 6,214 5,528 ~11.0% For the uncontended case, the new down_read_trylock() is the same as before. For the contended cases, the new down_read_trylock() is faster than before. The more contended, the more fast. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20211118094455.9068-1-songmuchun@bytedance.com Stable-dep-of: 3f5245538a19 ("locking/rwsem: Disable preemption in all down_read*() and up_read() code paths") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-14rtmutex: Ensure that the top waiter is always woken upWander Lairson Costa
commit db370a8b9f67ae5f17e3d5482493294467784504 upstream. Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock. T1 T2 T3 == == == spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter. This can be reproduced in PREEMPT_RT with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex. The problematic code is in rt_mutex_adjust_prio_chain(): // Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock); rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); // Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state); This only takes the case into account where @waiter is the new top waiter due to the requeue operation. But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation. Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock. [ tglx: Amend changelog, add Fixes tag ] Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-20lockdep: Fix -Wunused-parameter for _THIS_IP_Nick Desaulniers
[ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ] While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by cleanup. -Wunused-parameter: kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip' kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip' kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip' Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17locking/lockdep: Fix lockdep_init_map_*() confusionPeter Zijlstra
[ Upstream commit eae6d58d67d9739be5f7ae2dbead1d0ef6528243 ] Commit dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") added yet another lockdep_init_map_*() variant, but forgot to update all the existing users of the most complicated version. This could lead to a loss of lock_type and hence an incorrect report. Given the relative rarity of both local_lock and these annotations, this is unlikely to happen in practise, still, best fix things. Fixes: dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YqyEDtoan20K0CVD@worktop.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by ↵Waiman Long
first waiter commit 6eebd5fb20838f5971ba17df9f55cc4f84a31053 upstream. With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"), the writer that sets the handoff bit can be interrupted out without clearing the bit if the wait queue isn't empty. This disables reader and writer optimistic lock spinning and stealing. Now if a non-first writer in the queue is somehow woken up or a new waiter enters the slowpath, it can't acquire the lock. This is not the case before commit d257cc8cb8d5 as the writer that set the handoff bit will clear it when exiting out via the out_nolock path. This is less efficient as the busy rwsem stays in an unlock state for a longer time. In some cases, this new behavior may cause lockups as shown in [1] and [2]. This patch allows a non-first writer to ignore the handoff bit if it is not originally set or initiated by the first waiter. This patch is shown to be effective in fixing the lockup problem reported in [1]. [1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/ [2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/ Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08locking/lockdep: Iterate lock_classes directly when reading lockdep filesWaiman Long
[ Upstream commit fb7275acd6fb988313dddd8d3d19efa70d9015ad ] When dumping lock_classes information via /proc/lockdep, we can't take the lockdep lock as the lock hold time is indeterminate. Iterating over all_lock_classes without holding lock can be dangerous as there is a slight chance that it may branch off to other lists leading to infinite loop or even access invalid memory if changes are made to all_lock_classes list in parallel. To avoid this problem, iteration of lock classes is now done directly on the lock_classes array itself. The lock_classes_in_use bitmap is checked to see if the lock class is being used. To avoid iterating the full array all the times, a new max_lock_class_idx value is added to track the maximum lock_class index that is currently being used. We can theoretically take the lockdep lock for iterating all_lock_classes when other lockdep files (lockdep_stats and lock_stat) are accessed as the lock hold time will be shorter for them. For consistency, they are also modified to iterate the lock_classes array directly. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220211035526.1329503-2-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08locking/lockdep: Avoid potential access of invalid memory in lock_classWaiman Long
commit 61cc4534b6550997c97a03759ab46b29d44c0017 upstream. It was found that reading /proc/lockdep after a lockdep splat may potentially cause an access to freed memory if lockdep_unregister_key() is called after the splat but before access to /proc/lockdep [1]. This is due to the fact that graph_lock() call in lockdep_unregister_key() fails after the clearing of debug_locks by the splat process. After lockdep_unregister_key() is called, the lock_name may be freed but the corresponding lock_class structure still have a reference to it. That invalid memory pointer will then be accessed when /proc/lockdep is read by a user and a use-after-free (UAF) error will be reported if KASAN is enabled. To fix this problem, lockdep_unregister_key() is now modified to always search for a matching key irrespective of the debug_locks state and zap the corresponding lock class if a matching one is found. [1] https://lore.kernel.org/lkml/77f05c15-81b6-bddd-9650-80d5f23fe330@i-love.sakura.ne.jp/ Fixes: 8b39adbee805 ("locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Link: https://lkml.kernel.org/r/20220103023558.1377055-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23lockdep: Correct lock_classes index mappingCheng Jui Wang
commit 28df029d53a2fd80c1b8674d47895648ad26dcfb upstream. A kernel exception was hit when trying to dump /proc/lockdep_chains after lockdep report "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!": Unable to handle kernel paging request at virtual address 00054005450e05c3 ... 00054005450e05c3] address between user and kernel address ranges ... pc : [0xffffffece769b3a8] string+0x50/0x10c lr : [0xffffffece769ac88] vsnprintf+0x468/0x69c ... Call trace: string+0x50/0x10c vsnprintf+0x468/0x69c seq_printf+0x8c/0xd8 print_name+0x64/0xf4 lc_show+0xb8/0x128 seq_read_iter+0x3cc/0x5fc proc_reg_read_iter+0xdc/0x1d4 The cause of the problem is the function lock_chain_get_class() will shift lock_classes index by 1, but the index don't need to be shifted anymore since commit 01bb6f0af992 ("locking/lockdep: Change the range of class_idx in held_lock struct") already change the index to start from 0. The lock_classes[-1] located at chain_hlocks array. When printing lock_classes[-1] after the chain_hlocks entries are modified, the exception happened. The output of lockdep_chains are incorrect due to this problem too. Fixes: f611e8cf98ec ("lockdep: Take read/write status in consideration when generate chainkey") Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20220210105011.21712-1-cheng-jui.wang@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()Zqiang
commit 8f556a326c93213927e683fc32bbf5be1b62540a upstream. Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is negated. It terminates if the waiter is the top waiter, which is defeating the whole purpose. Fixes: c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless") Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01locking/rwsem: Make handoff bit handling more consistentWaiman Long
[ Upstream commit d257cc8cb8d5355ffc43a96bab94db7b5a324803 ] There are some inconsistency in the way that the handoff bit is being handled in readers and writers that lead to a race condition. Firstly, when a queue head writer set the handoff bit, it will clear it when the writer is being killed or interrupted on its way out without acquiring the lock. That is not the case for a queue head reader. The handoff bit will simply be inherited by the next waiter. Secondly, in the out_nolock path of rwsem_down_read_slowpath(), both the waiter and handoff bits are cleared if the wait queue becomes empty. For rwsem_down_write_slowpath(), however, the handoff bit is not checked and cleared if the wait queue is empty. This can potentially make the handoff bit set with empty wait queue. Worse, the situation in rwsem_down_write_slowpath() relies on wstate, a variable set outside of the critical section containing the ->count manipulation, this leads to race condition where RWSEM_FLAG_HANDOFF can be double subtracted, corrupting ->count. To make the handoff bit handling more consistent and robust, extract out handoff bit clearing code into the new rwsem_del_waiter() helper function. Also, completely eradicate wstate; always evaluate everything inside the same critical section. The common function will only use atomic_long_andnot() to clear bits when the wait queue is empty to avoid possible race condition. If the first waiter with handoff bit set is killed or interrupted to exit the slowpath without acquiring the lock, the next waiter will inherit the handoff bit. While at it, simplify the trylock for loop in rwsem_down_write_slowpath() to make it easier to read. Fixes: 4f23dbc1e657 ("locking/rwsem: Implement lock handoff to prevent lock starvation") Reported-by: Zhenhua Ma <mazhenhua@xiaomi.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211116012912.723980-1-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18lockdep: Let lock_is_held_type() detect recursive read as readSebastian Andrzej Siewior
[ Upstream commit 2507003a1d10917c9158077bf6030719d02c941e ] lock_is_held_type(, 1) detects acquired read locks. It only recognized locks acquired with lock_acquire_shared(). Read locks acquired with lock_acquire_shared_recursive() are not recognized because a `2' is stored as the read value. Rework the check to additionally recognise lock's read value one and two as a read held lock. Fixes: e918188611f07 ("locking: More accurate annotations for read_lock()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Waiman Long <longman@redhat.com> Link: https://lkml.kernel.org/r/20210903084001.lblecrvz4esl4mrr@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18locking/rwsem: Disable preemption for spinning regionYanfei Xu
[ Upstream commit 7cdacc5f52d68a9370f182c844b5b3e6cc975cc1 ] The spinning region rwsem_spin_on_owner() should not be preempted, however the rwsem_down_write_slowpath() invokes it and don't disable preemption. Fix it by adding a pair of preempt_disable/enable(). Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> [peterz: Fix CONFIG_RWSEM_SPIN_ON_OWNER=n build] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20211013134154.1085649-3-yanfei.xu@windriver.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18locking/lockdep: Avoid RCU-induced noinstr failPeter Zijlstra
[ Upstream commit ce0b9c805dd66d5e49fd53ec5415ae398f4c56e6 ] vmlinux.o: warning: objtool: look_up_lock_class()+0xc7: call to rcu_read_lock_any_held() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210624095148.311980536@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15locking/rwbase: Take care of ordering guarantee for fastpath readerBoqun Feng
Readers of rwbase can lock and unlock without taking any inner lock, if that happens, we need the ordering provided by atomic operations to satisfy the ordering semantics of lock/unlock. Without that, considering the follow case: { X = 0 initially } CPU 0 CPU 1 ===== ===== rt_write_lock(); X = 1 rt_write_unlock(): atomic_add(READER_BIAS - WRITER_BIAS, ->readers); // ->readers is READER_BIAS. rt_read_lock(): if ((r = atomic_read(->readers)) < 0) // True atomic_try_cmpxchg(->readers, r, r + 1); // succeed. <acquire the read lock via fast path> r1 = X; // r1 may be 0, because nothing prevent the reordering // of "X=1" and atomic_add() on CPU 1. Therefore audit every usage of atomic operations that may happen in a fast path, and add necessary barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210909110203.953991276@infradead.org
2021-09-15locking/rwbase: Extract __rwbase_write_trylock()Peter Zijlstra
The code in rwbase_write_lock() is a little non-obvious vs the read+set 'trylock', extract the sequence into a helper function to clarify the code. This also provides a single site to fix fast-path ordering. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/YUCq3L+u44NDieEJ@hirez.programming.kicks-ass.net
2021-09-15locking/rwbase: Properly match set_and_save_state() to restore_state()Peter Zijlstra
Noticed while looking at the readers race. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20210909110203.828203010@infradead.org
2021-09-09locking/rtmutex: Fix ww_mutex deadlock checkPeter Zijlstra
Dan reported that rt_mutex_adjust_prio_chain() can be called with .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") unconditionally dereferences it. Since both call-sites that have .orig_waiter == NULL don't care for the return value, simply disable the deadlock squash by adding the NULL check. Notably, both callers use the deadlock condition as a termination condition for the iteration; once detected, it is sure that (de)boosting is done. Arguably step [3] would be a more natural termination point, but it's dubious whether adding a third deadlock detection state would improve the code. Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/YS9La56fHMiCCo75@hirez.programming.kicks-ass.net
2021-09-02locking/rwsem: Add missing __init_rwsem() for PREEMPT_RTMike Galbraith
730633f0b7f95 became the first direct caller of __init_rwsem() vs the usual init_rwsem(), exposing PREEMPT_RT's lack thereof. Add it. [ tglx: Move it out of line ] Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/50a936b7d8f12277d6ec7ed2ef0421a381056909.camel@gmx.de
2021-08-30Merge tag 'locking-core-2021-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomics updates from Thomas Gleixner: "The regular pile: - A few improvements to the mutex code - Documentation updates for atomics to clarify the difference between cmpxchg() and try_cmpxchg() and to explain the forward progress expectations. - Simplification of the atomics fallback generator - The addition of arch_atomic_long*() variants and generic arch_*() bitops based on them. - Add the missing might_sleep() invocations to the down*() operations of semaphores. The PREEMPT_RT locking core: - Scheduler updates to support the state preserving mechanism for 'sleeping' spin- and rwlocks on RT. This mechanism is carefully preserving the state of the task when blocking on a 'sleeping' spin- or rwlock and takes regular wake-ups targeted at the same task into account. The preserved or updated (via a regular wakeup) state is restored when the lock has been acquired. - Restructuring of the rtmutex code so it can be utilized and extended for the RT specific lock variants. - Restructuring of the ww_mutex code to allow sharing of the ww_mutex specific functionality for rtmutex based ww_mutexes. - Header file disentangling to allow substitution of the regular lock implementations with the PREEMPT_RT variants without creating an unmaintainable #ifdef mess. - Shared base code for the PREEMPT_RT specific rw_semaphore and rwlock implementations. Contrary to the regular rw_semaphores and rwlocks the PREEMPT_RT implementation is writer unfair because it is infeasible to do priority inheritance on multiple readers. Experience over the years has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The alternative solution would be to allow only a single reader which has been tried and discarded as it is a major bottleneck especially for mmap_sem. Aside of that many of the writer starvation critical usage sites have been converted to a writer side mutex/spinlock and RCU read side protections in the past decade so that the issue is less prominent than it used to be. - The actual rtmutex based lock substitutions for PREEMPT_RT enabled kernels which affect mutex, ww_mutex, rw_semaphore, spinlock_t and rwlock_t. The spin/rw_lock*() functions disable migration across the critical section to preserve the existing semantics vs per-CPU variables. - Rework of the futex REQUEUE_PI mechanism to handle the case of early wake-ups which interleave with a re-queue operation to prevent the situation that a task would be blocked on both the rtmutex associated to the outer futex and the rtmutex based hash bucket spinlock. While this situation cannot happen on !RT enabled kernels the changes make the underlying concurrency problems easier to understand in general. As a result the difference between !RT and RT kernels is reduced to the handling of waiting for the critical section. !RT kernels simply spin-wait as before and RT kernels utilize rcu_wait(). - The substitution of local_lock for PREEMPT_RT with a spinlock which protects the critical section while staying preemptible. The CPU locality is established by disabling migration. The underlying concepts of this code have been in use in PREEMPT_RT for way more than a decade. The code has been refactored several times over the years and this final incarnation has been optimized once again to be as non-intrusive as possible, i.e. the RT specific parts are mostly isolated. It has been extensively tested in the 5.14-rt patch series and it has been verified that !RT kernels are not affected by these changes" * tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (92 commits) locking/rtmutex: Return success on deadlock for ww_mutex waiters locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes locking/rtmutex: Dequeue waiter on ww_mutex deadlock locking/rtmutex: Dont dereference waiter lockless locking/semaphore: Add might_sleep() to down_*() family locking/ww_mutex: Initialize waiter.ww_ctx properly static_call: Update API documentation locking/local_lock: Add PREEMPT_RT support locking/spinlock/rt: Prepare for RT local_lock locking/rtmutex: Add adaptive spinwait mechanism locking/rtmutex: Implement equal priority lock stealing preempt: Adjust PREEMPT_LOCK_OFFSET for RT locking/rtmutex: Prevent lockdep false positive with PI futexes futex: Prevent requeue_pi() lock nesting issue on RT futex: Simplify handle_early_requeue_pi_wakeup() futex: Reorder sanity checks in futex_requeue() futex: Clarify comment in futex_requeue() futex: Restructure futex_requeue() futex: Correct the number of requeued waiters for PI futex: Remove bogus condition for requeue PI ...
2021-08-30Merge branch 'core-rcu.2021.08.28a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "RCU changes for this cycle were: - Documentation updates - Miscellaneous fixes - Offloaded-callbacks updates - Updates to the nolibc library - Tasks-RCU updates - In-kernel torture-test updates - Torture-test scripting, perhaps most notably the pinning of torture-test guest OSes so as to force differences in memory latency. For example, in a two-socket system, a four-CPU guest OS will have one pair of its CPUs pinned to threads in a single core on one socket and the other pair pinned to threads in a single core on the other socket. This approach proved able to force race conditions that earlier testing missed. Some of these race conditions are still being tracked down" * 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits) torture: Replace deprecated CPU-hotplug functions. rcu: Replace deprecated CPU-hotplug functions rcu: Print human-readable message for schedule() in RCU reader rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU rcu: Use per_cpu_ptr to get the pointer of per_cpu variable rcu: Remove useless "ret" update in rcu_gp_fqs_loop() rcu: Mark accesses in tree_stall.h rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack rcu: Mark lockless ->qsmask read in rcu_check_boost_fail() srcutiny: Mark read-side data races rcu: Start timing stall repetitions after warning complete rcu: Do not disable GP stall detection in rcu_cpu_stall_reset() rcu/tree: Handle VM stoppage in stall detection rculist: Unify documentation about missing list_empty_rcu() rcu: Mark accesses to ->rcu_read_lock_nesting rcu: Weaken ->dynticks accesses and updates rcu: Remove special bit at the bottom of the ->dynticks counter rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock rcu: Fix to include first blocked task in stall warning torture: Make kvm-test-1-run-qemu.sh check for reboot loops ...
2021-08-27locking/rtmutex: Return success on deadlock for ww_mutex waitersPeter Zijlstra
ww_mutexes can legitimately cause a deadlock situation in the lock graph which is resolved afterwards by the wait/wound mechanics. The rtmutex chain walk can detect such a deadlock and returns EDEADLK which in turn skips the wait/wound mechanism and returns EDEADLK to the caller. That's wrong because both lock chains might get EDEADLK or the wrong waiter would back out. Detect that situation and return 'success' in case that the waiter which initiated the chain walk is a ww_mutex with context. This allows the wait/wound mechanics to resolve the situation according to the rules. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
2021-08-27locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexesPeter Zijlstra
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph which can be observed by a blocker which didn't cause the problem: P1: A, ww_A, ww_B P2: ww_B, ww_A P3: A P3 might therefore be trapped in the ww_mutex induced cycle and run into the lock depth limitation of rt_mutex_adjust_prio_chain() which returns -EDEADLK to the caller. Disable the deadlock detection walk when the chain walk observes a ww_mutex to prevent this looping. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
2021-08-25locking/rtmutex: Dequeue waiter on ww_mutex deadlockThomas Gleixner
The rt_mutex based ww_mutex variant queues the new waiter first in the lock's rbtree before evaluating the ww_mutex specific conditions which might decide that the waiter should back out. This check and conditional exit happens before the waiter is enqueued into the PI chain. The failure handling at the call site assumes that the waiter, if it is the top most waiter on the lock, is queued in the PI chain and then proceeds to adjust the unmodified PI chain, which results in RB tree corruption. Dequeue the waiter from the lock waiter list in the ww_mutex error exit path to prevent this. Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de
2021-08-25locking/rtmutex: Dont dereference waiter locklessThomas Gleixner
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which is broken because that function contains a sanity check which dereferences the top waiter pointer to check whether the waiter belongs to the lock. That's wrong in the lockless spinwait case: CPU 0 CPU 1 rt_mutex_lock(lock) rt_mutex_lock(lock); queue(waiter0) waiter0 == rt_mutex_top_waiter(lock) rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1) waiter1 == rt_mutex_top_waiter(lock) ... top_waiter = rt_mutex_top_waiter(lock) leftmost = rb_first_cached(&lock->waiters); -> signal dequeue(waiter1) destroy(waiter1) w = rb_entry(leftmost, ....) BUG_ON(w->lock != lock) <- UAF The BUG_ON() is correct for the case where the caller holds lock->wait_lock which guarantees that the leftmost waiter entry cannot vanish. For the lockless spinwait case it's broken. Create a new helper function which avoids the pointer dereference and just compares the leftmost entry pointer with current's waiter pointer to validate that currrent is still elegible for spinning. Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de
2021-08-20locking/semaphore: Add might_sleep() to down_*() familyXiaoming Ni
Semaphore is sleeping lock. Add might_sleep() to down*() family (with exception of down_trylock()) to detect atomic context sleep. Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210809021215.19991-1-nixiaoming@huawei.com
2021-08-20locking/ww_mutex: Initialize waiter.ww_ctx properlySebastian Andrzej Siewior
The consolidation of the debug code for mutex waiter intialization sets waiter::ww_ctx to a poison value unconditionally. For regular mutexes this is intended to catch the case where waiter_ww_ctx is dereferenced accidentally. For ww_mutex the poison value has to be overwritten either with a context pointer or NULL for ww_mutexes without context. The rework broke this as it made the store conditional on the context pointer instead of the argument which signals whether ww_mutex code should be compiled in or optiized out. As a result waiter::ww_ctx ends up with the poison pointer for contextless ww_mutexes which causes a later dereference of the poison pointer because it is != NULL. Use the build argument instead so for ww_mutex the poison value is always overwritten. Fixes: c0afb0ffc06e6 ("locking/ww_mutex: Gather mutex_waiter initialization") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210819193030.zpwrpvvrmy7xxxiy@linutronix.de
2021-08-17locking/spinlock/rt: Prepare for RT local_lockThomas Gleixner
Add the static and runtime initializer mechanics to support the RT variant of local_lock, which requires the lock type in the lockdep map to be set to LD_LOCK_PERCPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.967526724@linutronix.de
2021-08-17locking/rtmutex: Add adaptive spinwait mechanismSteven Rostedt
Going to sleep when locks are contended can be quite inefficient when the contention time is short and the lock owner is running on a different CPU. The MCS mechanism cannot be used because MCS is strictly FIFO ordered while for rtmutex based locks the waiter ordering is priority based. Provide a simple adaptive spinwait mechanism which currently restricts the spinning to the top priority waiter. [ tglx: Provide a contemporary changelog, extended it to all rtmutex based locks and updated it to match the other spin on owner implementations ] Originally-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.912050691@linutronix.de
2021-08-17locking/rtmutex: Implement equal priority lock stealingGregory Haskins
The current logic only allows lock stealing to occur if the current task is of higher priority than the pending owner. Significant throughput improvements can be gained by allowing the lock stealing to include tasks of equal priority when the contended lock is a spin_lock or a rw_lock and the tasks are not in a RT scheduling task. The assumption was that the system will make faster progress by allowing the task already on the CPU to take the lock rather than waiting for the system to wake up a different task. This does add a degree of unfairness, but in reality no negative side effects have been observed in the many years that this has been used in the RT kernel. [ tglx: Refactored and rewritten several times by Steve Rostedt, Sebastian Siewior and myself ] Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.857240222@linutronix.de
2021-08-17locking/rtmutex: Prevent lockdep false positive with PI futexesThomas Gleixner
On PREEMPT_RT the futex hashbucket spinlock becomes 'sleeping' and rtmutex based. That causes a lockdep false positive because some of the futex functions invoke spin_unlock(&hb->lock) with the wait_lock of the rtmutex associated to the pi_futex held. spin_unlock() in turn takes wait_lock of the rtmutex on which the spinlock is based which makes lockdep notice a lock recursion. Give the futex/rtmutex wait_lock a separate key. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.750701219@linutronix.de
2021-08-17locking/rtmutex: Add mutex variant for RTThomas Gleixner
Add the necessary defines, helpers and API functions for replacing struct mutex on a PREEMPT_RT enabled kernel with an rtmutex based variant. No functional change when CONFIG_PREEMPT_RT=n Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.081517417@linutronix.de
2021-08-17locking/ww_mutex: Implement rtmutex based ww_mutex API functionsPeter Zijlstra
Add the actual ww_mutex API functions which replace the mutex based variant on RT enabled kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.024057938@linutronix.de
2021-08-17locking/rtmutex: Extend the rtmutex core to support ww_mutexPeter Zijlstra
Add a ww acquire context pointer to the waiter and various functions and add the ww_mutex related invocations to the proper spots in the locking code, similar to the mutex based variant. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.966139174@linutronix.de
2021-08-17locking/ww_mutex: Add rt_mutex based lock type and accessorsPeter Zijlstra
Provide the defines for RT mutex based ww_mutexes and fix up the debug logic so it's either enabled by DEBUG_MUTEXES or DEBUG_RT_MUTEXES on RT kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.908012566@linutronix.de
2021-08-17locking/ww_mutex: Add RT priority to W/W orderPeter Zijlstra
RT mutex based ww_mutexes cannot order based on timestamps. They have to order based on priority. Add the necessary decision logic. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.847536630@linutronix.de
2021-08-17locking/ww_mutex: Implement rt_mutex accessorsPeter Zijlstra
Provide the type defines and the helper inlines for rtmutex based ww_mutexes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.790760545@linutronix.de
2021-08-17locking/ww_mutex: Abstract out internal lock accessesThomas Gleixner
Accessing the internal wait_lock of mutex and rtmutex is slightly different. Provide helper functions for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.734635961@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex typesPeter Zijlstra
Some ww_mutex helper functions use pointers for the underlying mutex and mutex_waiter. The upcoming rtmutex based implementation needs to share these functions. Add and use defines for the types and replace the direct types in the affected functions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.678720245@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex accessorsPeter Zijlstra
Move the mutex related access from various ww_mutex functions into helper functions so they can be substituted for rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.622477030@linutronix.de
2021-08-17locking/ww_mutex: Abstract out waiter enqueueingPeter Zijlstra
The upcoming rtmutex based ww_mutex needs a different handling for enqueueing a waiter. Split it out into a helper function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.566318143@linutronix.de
2021-08-17locking/ww_mutex: Abstract out the waiter iterationPeter Zijlstra
Split out the waiter iteration functions so they can be substituted for a rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.509186185@linutronix.de
2021-08-17locking/ww_mutex: Remove the __sched annotation from ww_mutex APIsPeter Zijlstra
None of these functions will be on the stack when blocking in schedule(), hence __sched is not needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.453235952@linutronix.de
2021-08-17locking/ww_mutex: Split out the W/W implementation logic into ↵Peter Zijlstra (Intel)
kernel/locking/ww_mutex.h Split the W/W mutex helper functions out into a separate header file, so they can be shared with a rtmutex based variant later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.396893399@linutronix.de
2021-08-17locking/ww_mutex: Split up ww_mutex_unlock()Peter Zijlstra (Intel)
Split the ww related part out into a helper function so it can be reused for a rtmutex based ww_mutex implementation. [ mingo: Fixed bisection failure. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.340166556@linutronix.de
2021-08-17locking/ww_mutex: Gather mutex_waiter initializationPeter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.281927514@linutronix.de
2021-08-17locking/ww_mutex: Simplify lockdep annotationsPeter Zijlstra
No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.222921634@linutronix.de
2021-08-17locking/mutex: Make mutex::wait_lock rawThomas Gleixner
The wait_lock of mutex is really a low level lock. Convert it to a raw_spinlock like the wait_lock of rtmutex. [ mingo: backmerged the test_lockup.c build fix by bigeasy. ] Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.166863404@linutronix.de
2021-08-17locking/mutex: Move the 'struct mutex_waiter' definition from ↵Thomas Gleixner
<linux/mutex.h> to the internal header Move the mutex waiter declaration from the public <linux/mutex.h> header to the internal kernel/locking/mutex.h header. There is no reason to expose it outside of the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.054325923@linutronix.de