summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2024-11-08lockdep: fix deadlock issue between lockdep and rcuZhiguo Niu
commit a6f88ac32c6e63e69c595bfae220d8641704c9b7 upstream. There is a deadlock scenario between lockdep and rcu when rcu nocb feature is enabled, just as following call stack: rcuop/x -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?) -001|queued_spin_lock(inline) // try to hold nocb_gp_lock -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80) -002|__raw_spin_lock_irqsave(inline) -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80) -003|wake_nocb_gp_defer(inline) -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680) -004|__call_rcu_common(inline) -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?) -005|call_rcu_zapped(inline) -005|free_zapped_rcu(ch = ?)// hold graph lock -006|rcu_do_batch(rdp = 0xFFFFFF817F245680) -007|nocb_cb_wait(inline) -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680) -008|kthread(_create = 0xFFFFFF80803122C0) -009|ret_from_fork(asm) rcuop/y -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0) -001|queued_spin_lock() -001|lockdep_lock() -001|graph_lock() // try to hold graph lock -002|lookup_chain_cache_add() -002|validate_chain() -003|lock_acquire -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80) -005|lock_timer_base(inline) -006|mod_timer(inline) -006|wake_nocb_gp_defer(inline)// hold nocb_gp_lock -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680) -007|__call_rcu_common(inline) -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?) -008|call_rcu_hurry(inline) -008|rcu_sync_call(inline) -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58) -009|rcu_do_batch(rdp = 0xFFFFFF817F266680) -010|nocb_cb_wait(inline) -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680) -011|kthread(_create = 0xFFFFFF8080363740) -012|ret_from_fork(asm) rcuop/x and rcuop/y are rcu nocb threads with the same nocb gp thread. This patch release the graph lock before lockdep call_rcu. Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") Cc: stable@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Waiman Long <longman@redhat.com> Cc: Carlos Llamas <cmllamas@google.com> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Carlos Llamas <cmllamas@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20240620225436.3127927-1-cmllamas@google.com Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08locking/lockdep: Avoid potential access of invalid memory in lock_classWaiman Long
commit 61cc4534b6550997c97a03759ab46b29d44c0017 upstream. It was found that reading /proc/lockdep after a lockdep splat may potentially cause an access to freed memory if lockdep_unregister_key() is called after the splat but before access to /proc/lockdep [1]. This is due to the fact that graph_lock() call in lockdep_unregister_key() fails after the clearing of debug_locks by the splat process. After lockdep_unregister_key() is called, the lock_name may be freed but the corresponding lock_class structure still have a reference to it. That invalid memory pointer will then be accessed when /proc/lockdep is read by a user and a use-after-free (UAF) error will be reported if KASAN is enabled. To fix this problem, lockdep_unregister_key() is now modified to always search for a matching key irrespective of the debug_locks state and zap the corresponding lock class if a matching one is found. [1] https://lore.kernel.org/lkml/77f05c15-81b6-bddd-9650-80d5f23fe330@i-love.sakura.ne.jp/ Fixes: 8b39adbee805 ("locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lkml.kernel.org/r/20220103023558.1377055-1-longman@redhat.com Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08locking/lockdep: Rework lockdep_lockPeter Zijlstra
commit 248efb2158f1e23750728e92ad9db3ab60c14485 upstream. A few sites want to assert we own the graph_lock/lockdep_lock, provide a more conventional lock interface for it with a number of trivial debug checks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313102107.GX12561@hirez.programming.kicks-ass.net Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08locking/lockdep: Fix bad recursion patternPeter Zijlstra
commit 10476e6304222ced7df9b3d5fb0a043b3c2a1ad8 upstream. There were two patterns for lockdep_recursion: Pattern-A: if (current->lockdep_recursion) return current->lockdep_recursion = 1; /* do stuff */ current->lockdep_recursion = 0; Pattern-B: current->lockdep_recursion++; /* do stuff */ current->lockdep_recursion--; But a third pattern has emerged: Pattern-C: current->lockdep_recursion = 1; /* do stuff */ current->lockdep_recursion = 0; And while this isn't broken per-se, it is highly dangerous because it doesn't nest properly. Get rid of all Pattern-C instances and shore up Pattern-A with a warning. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12rtmutex: Drop rt_mutex::wait_lock before schedulingRoland Xu
commit d33d26036a0274b472299d7dcdaa5fb34329f91b upstream. rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the good case it returns with the lock held and in the deadlock case it emits a warning and goes into an endless scheduling loop with the lock held, which triggers the 'scheduling in atomic' warning. Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning and dropping into the schedule for ever loop. [ tglx: Moved unlock before the WARN(), removed the pointless comment, massaged changelog, added Fixes tag ] Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter") Signed-off-by: Roland Xu <mu001999@outlook.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300MB0635.AUSP300.PROD.OUTLOOK.COM Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28locking/ww_mutex/test: Fix potential workqueue corruptionJohn Stultz
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ] In some cases running with the test-ww_mutex code, I was seeing odd behavior where sometimes it seemed flush_workqueue was returning before all the work threads were finished. Often this would cause strange crashes as the mutexes would be freed while they were being used. Looking at the code, there is a lifetime problem as the controlling thread that spawns the work allocates the "struct stress" structures that are passed to the workqueue threads. Then when the workqueue threads are finished, they free the stress struct that was passed to them. Unfortunately the workqueue work_struct node is in the stress struct. Which means the work_struct is freed before the work thread returns and while flush_workqueue is waiting. It seems like a better idea to have the controlling thread both allocate and free the stress structures, so that we can be sure we don't corrupt the workqueue by freeing the structure prematurely. So this patch reworks the test to do so, and with this change I no longer see the early flush_workqueue returns. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09treewide: Remove uninitialized_var() usageKees Cook
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17locking/lockdep: Avoid RCU-induced noinstr failPeter Zijlstra
[ Upstream commit ce0b9c805dd66d5e49fd53ec5415ae398f4c56e6 ] vmlinux.o: warning: objtool: look_up_lock_class()+0xc7: call to rcu_read_lock_any_held() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210624095148.311980536@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15locking/mutex: Fix HANDOFF conditionPeter Zijlstra
[ Upstream commit 048661a1f963e9517630f080687d48af79ed784c ] Yanfei reported that setting HANDOFF should not depend on recomputing @first, only on @first state. Which would then give: if (ww_ctx || !first) first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); But because 'ww_ctx || !first' is basically 'always' and the test for first is relatively cheap, omit that first branch entirely. Reported-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()Boqun Feng
[ Upstream commit 7b1f8c6179769af6ffa055e1169610b51d71edd5 ] In the step #3 of check_irq_usage(), we seach backwards to find a lock whose usage conflicts the usage of @target_entry1 on safe/unsafe. However, we should only keep the irq-unsafe usage of @target_entry1 into consideration, because it could be a case where a lock is hardirq-unsafe but soft-safe, and in check_irq_usage() we find it because its hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but currently since we don't filter out the other usage bits, so we may find a lock dependency path softirq-unsafe -> softirq-safe, which in fact doesn't cause a deadlock. And this may cause misleading lockdep splats. Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the backwards search. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210618170110.3699115-4-boqun.feng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14locking/lockdep: Fix the dep path printing for backwards BFSBoqun Feng
[ Upstream commit 69c7a5fb2482636f525f016c8333fdb9111ecb9d ] We use the same code to print backwards lock dependency path as the forwards lock dependency path, and this could result into incorrect printing because for a backwards lock_list ->trace is not the call trace where the lock of ->class is acquired. Fix this by introducing a separate function on printing the backwards dependency path. Also add a few comments about the printing while we are at it. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210618170110.3699115-2-boqun.feng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signalZqiang
[ Upstream commit 3a010c493271f04578b133de977e0e5dd2848cea ] When a interruptible mutex locker is interrupted by a signal without acquiring this lock and removed from the wait queue. if the mutex isn't contended enough to have a waiter put into the wait queue again, the setting of the WAITER bit will force mutex locker to go into the slowpath to acquire the lock every time, so if the wait queue is empty, the WAITER bit need to be clear. Fixes: 040a0a371005 ("mutex: Add support for wound/wait style locks") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-28locking/qrwlock: Fix ordering in queued_write_lock_slowpath()Ali Saidi
[ Upstream commit 84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896 ] While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer | Reader -------------------------------------------------------------------------------- ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(&ep->lock, flags); --> (observes value before unlock) | chain_epi_lockless() | | epi->next = xchg(&ep->ovflist, epi); | | read_unlock_irqrestore(&ep->lock, flags); | | | atomic_cmpxchg_relaxed() | |-- READ_ONCE(ep->ovflist); | A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <alisaidi@amazon.com> [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-21lockdep: Add a missing initialization hint to the "INFO: Trying to register ↵Tetsuo Handa
non-static key" message [ Upstream commit 3a85969e9d912d5dd85362ee37b5f81266e00e77 ] Since this message is printed when dynamically allocated spinlocks (e.g. kzalloc()) are used without initialization (e.g. spin_lock_init()), suggest to developers to check whether initialization functions for objects were called, before making developers wonder what annotation is missing. [ mingo: Minor tweaks to the message. ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handlingWaiman Long
[ Upstream commit 5de2055d31ea88fd9ae9709ac95c372a505a60fa ] The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx) combination is repetitive. In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx && ww_ctx) by just (ww_ctx). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-30rtmutex: Remove unused argument from rt_mutex_proxy_unlock()Thomas Gleixner
commit 2156ac1934166d6deb6cd0f6ffc4c1076ec63697 upstream Nothing uses the argument. Remove it as preparation to use pi_state_update_owner(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09exec: Transform exec_update_mutex into a rw_semaphoreEric W. Biederman
[ Upstream commit f7cfd871ae0c5008d94b6f66834e7845caa93c15 ] Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09rwsem: Implement down_read_interruptibleEric W. Biederman
[ Upstream commit 31784cff7ee073b34d6eddabb95e3be2880a425c ] In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09rwsem: Implement down_read_killable_nestedEric W. Biederman
[ Upstream commit 0f9368b5bf6db0c04afc5454b1be79022a681615 ] In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01locking/lockdep: Decrement IRQ context counters when removing lock chainWaiman Long
[ Upstream commit b3b9c187dc2544923a601733a85352b9ddaba9b3 ] There are currently three counters to track the IRQ context of a lock chain - nr_hardirq_chains, nr_softirq_chains and nr_process_chains. They are incremented when a new lock chain is added, but they are not decremented when a lock chain is removed. That causes some of the statistic counts reported by /proc/lockdep_stats to be incorrect. IRQ Fix that by decrementing the right counter when a lock chain is removed. Since inc_chains() no longer accesses hardirq_context and softirq_context directly, it is moved out from the CONFIG_TRACE_IRQFLAGS conditional compilation block. Fixes: a0b0fd53e1e6 ("locking/lockdep: Free lock classes that are no longer in use") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-2-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03locking/lockdep: Fix overflow in presentation of average lock-timeChris Wilson
[ Upstream commit a7ef9b28aa8d72a1656fa6f0a01bbd1493886317 ] Though the number of lock-acquisitions is tracked as unsigned long, this is passed as the divisor to div_s64() which interprets it as a s32, giving nonsense values with more than 2 billion acquisitons. E.g. acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg ------------------------------------------------------------------------- 2350439395 0.07 353.38 649647067.36 0.-32 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20200725185110.11588-1-chris@chris-wilson.co.uk Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23locktorture: Print ratio of acquisitions, not failuresPaul E. McKenney
commit 80c503e0e68fbe271680ab48f0fe29bc034b01b7 upstream. The __torture_print_stats() function in locktorture.c carefully initializes local variable "min" to statp[0].n_lock_acquired, but then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail field should normally be zero, and given the initialization, it seems reasonable to display the maximum and minimum number acquisitions instead of miscomputing the maximum and minimum number of failures. This commit therefore switches from failures to acquisitions. And this turns out to be not only a day-zero bug, but entirely my own fault. I hate it when that happens! Fixes: 0af3fe1efa53 ("locktorture: Add a lock-torture kernel module") Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()Boqun Feng
[ Upstream commit 25016bd7f4caf5fc983bbab7403d08e64cba3004 ] Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep triggered a warning: [ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ... [ ] Call Trace: [ ] lock_is_held_type+0x5d/0x150 [ ] ? rcu_lockdep_current_cpu_online+0x64/0x80 [ ] rcu_read_lock_any_held+0xac/0x100 [ ] ? rcu_read_lock_held+0xc0/0xc0 [ ] ? __slab_free+0x421/0x540 [ ] ? kasan_kmalloc+0x9/0x10 [ ] ? __kmalloc_node+0x1d7/0x320 [ ] ? kvmalloc_node+0x6f/0x80 [ ] __bfs+0x28a/0x3c0 [ ] ? class_equal+0x30/0x30 [ ] lockdep_count_forward_deps+0x11a/0x1a0 The warning got triggered because lockdep_count_forward_deps() call __bfs() without current->lockdep_recursion being set, as a result a lockdep internal function (__bfs()) is checked by lockdep, which is unexpected, and the inconsistency between the irq-off state and the state traced by lockdep caused the warning. Apart from this warning, lockdep internal functions like __bfs() should always be protected by current->lockdep_recursion to avoid potential deadlocks and data inconsistency, therefore add the current->lockdep_recursion on-and-off section to protect __bfs() in both lockdep_count_forward_deps() and lockdep_count_backward_deps() Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05locking/lockdep: Fix lockdep_stats indentation problemWaiman Long
commit a030f9767da1a6bbcec840fc54770eb11c2414b6 upstream. It was found that two lines in the output of /proc/lockdep_stats have indentation problem: # cat /proc/lockdep_stats : in-process chains: 25057 stack-trace entries: 137827 [max: 524288] number of stack traces: 7973 number of stack hash chains: 6355 combined max dependencies: 1356414598 hardirq-safe locks: 57 hardirq-unsafe locks: 1286 : All the numbers displayed in /proc/lockdep_stats except the two stack trace numbers are formatted with a field with of 11. To properly align all the numbers, a field width of 11 is now added to the two stack trace numbers. Fixes: 8c779229d0f4 ("locking/lockdep: Report more stack trace statistics") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lkml.kernel.org/r/20191211213139.29934-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23locking/lockdep: Fix buffer overrun problem in stack_trace[]Waiman Long
commit d91f3057263ceb691ef527e71b41a56b17f6c869 upstream. If the lockdep code is really running out of the stack_trace entries, it is likely that buffer overrun can happen and the data immediately after stack_trace[] will be corrupted. If there is less than LOCK_TRACE_SIZE_IN_LONGS entries left before the call to save_trace(), the max_entries computation will leave it with a very large positive number because of its unsigned nature. The subsequent call to stack_trace_save() will then corrupt the data after stack_trace[]. Fix that by changing max_entries to a signed integer and check for negative value before calling stack_trace_save(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 12593b7467f9 ("locking/lockdep: Reduce space occupied by stack traces") Link: https://lkml.kernel.org/r/20191220135128.14876-1-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWNWaiman Long
commit 39e7234f00bc93613c086ae42d852d5f4147120a upstream. The commit 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") will allow a recently woken up waiting writer to spin on the owner. Unfortunately, if the owner happens to be RWSEM_OWNER_UNKNOWN, the code will incorrectly spin on it leading to a kernel crash. This is fixed by passing the proper non-spinnable bits to rwsem_spin_on_owner() so that RWSEM_OWNER_UNKNOWN will be treated as a non-spinnable target. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200115154336.8679-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12locking/spinlock/debug: Fix various data racesMarco Elver
[ Upstream commit 1a365e822372ba24c9da0822bc583894f6f3d821 ] This fixes various data races in spinlock_debug. By testing with KCSAN, it is observable that the console gets spammed with data races reports, suggesting these are extremely frequent. Example data race report: read to 0xffff8ab24f403c48 of 4 bytes by task 221 on cpu 2: debug_spin_lock_before kernel/locking/spinlock_debug.c:85 [inline] do_raw_spin_lock+0x9b/0x210 kernel/locking/spinlock_debug.c:112 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline] _raw_spin_lock+0x39/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] get_partial_node.isra.0.part.0+0x32/0x2f0 mm/slub.c:1873 get_partial_node mm/slub.c:1870 [inline] <snip> write to 0xffff8ab24f403c48 of 4 bytes by task 167 on cpu 3: debug_spin_unlock kernel/locking/spinlock_debug.c:103 [inline] do_raw_spin_unlock+0xc9/0x1a0 kernel/locking/spinlock_debug.c:138 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:159 [inline] _raw_spin_unlock_irqrestore+0x2d/0x50 kernel/locking/spinlock.c:191 spin_unlock_irqrestore include/linux/spinlock.h:393 [inline] free_debug_processing+0x1b3/0x210 mm/slub.c:1214 __slab_free+0x292/0x400 mm/slub.c:2864 <snip> As a side-effect, with KCSAN, this eventually locks up the console, most likely due to deadlock, e.g. .. -> printk lock -> spinlock_debug -> KCSAN detects data race -> kcsan_print_report() -> printk lock -> deadlock. This fix will 1) avoid the data races, and 2) allow using lock debugging together with KCSAN. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20191120155715.28089-1-elver@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-25Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"Wanpeng Li
This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted). A large performance regression was caused by this commit. on over-subscription scenarios. The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads, with three VMs of 80 vCPUs each. The score of ebizzy -M is reduced from 13000-14000 records/s to 1700-1800 records/s: Host Guest score vanilla w/o kvm optimizations upstream 1700-1800 records/s vanilla w/o kvm optimizations revert 13000-14000 records/s vanilla w/ kvm optimizations upstream 4500-5000 records/s vanilla w/ kvm optimizations revert 14000-15500 records/s Exit from aggressive wait-early mechanism can result in premature yield and extra scheduling latency. Actually, only 6% of wait_early events are caused by vcpu_is_preempted() being true. However, when one vCPU voluntarily releases its vCPU, all the subsequently waiters in the queue will do the same and the cascading effect leads to bad performance. kvm optimizations: [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts) [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption) Tested-by: loobinliu@tencent.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: loobinliu@tencent.com Cc: stable@vger.kernel.org Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-16Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - improve rwsem scalability - add uninitialized rwsem debugging check - reduce lockdep's stacktrace memory usage and add diagnostics - misc cleanups, code consolidation and constification * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Fix up mutex_waiter usage locking/mutex: Use mutex flags macro instead of hard code locking/mutex: Make __mutex_owner static to mutex.c locking/qspinlock,x86: Clarify virt_spin_lock_key locking/rwsem: Check for operations on an uninitialized rwsem locking/rwsem: Make handoff writer optimistically spin on owner locking/lockdep: Report more stack trace statistics locking/lockdep: Reduce space occupied by stack traces stacktrace: Constify 'entries' arguments locking/lockdep: Make it clear that what lock_class::key points at is not modified
2019-09-16Merge branch 'sched/rt' into sched/core, to pick up -rt changesIngo Molnar
Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-08mutex: Fix up mutex_waiter usagePeter Zijlstra
The patch moving bits into mutex.c was a little too much; by also moving struct mutex_waiter a few less common CONFIGs would no longer build. Fixes: 5f35d5a66b3e ("locking/mutex: Make __mutex_owner static to mutex.c") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2019-08-06locking/mutex: Use mutex flags macro instead of hard codeMukesh Ojha
Use the mutex flag macro instead of hard code value inside __mutex_owner(). Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mingo@redhat.com Cc: will@kernel.org Link: https://lkml.kernel.org/r/1564585504-3543-2-git-send-email-mojha@codeaurora.org
2019-08-06locking/mutex: Make __mutex_owner static to mutex.cMukesh Ojha
__mutex_owner() should only be used by the mutex api's. So, to put this restiction let's move the __mutex_owner() function definition from linux/mutex.h to mutex.c file. There exist functions that uses __mutex_owner() like mutex_is_locked() and mutex_trylock_recursive(), So to keep legacy thing intact move them as well and export them. Move mutex_waiter structure also to keep it private to the file. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mingo@redhat.com Cc: will@kernel.org Link: https://lkml.kernel.org/r/1564585504-3543-1-git-send-email-mojha@codeaurora.org
2019-08-06locking/rwsem: Check for operations on an uninitialized rwsemDavidlohr Bueso
Currently rwsems is the only locking primitive that lacks this debug feature. Add it under CONFIG_DEBUG_RWSEMS and do the magic checking in the locking fastpath (trylock) operation such that we cover all cases. The unlocking part is pretty straightforward. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: mingo@kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Link: https://lkml.kernel.org/r/20190729044735.9632-1-dave@stgolabs.net
2019-08-06locking/rwsem: Make handoff writer optimistically spin on ownerWaiman Long
When the handoff bit is set by a writer, no other tasks other than the setting writer itself is allowed to acquire the lock. If the to-be-handoff'ed writer goes to sleep, there will be a wakeup latency period where the lock is free, but no one can acquire it. That is less than ideal. To reduce that latency, the handoff writer will now optimistically spin on the owner if it happens to be a on-cpu writer. It will spin until it releases the lock and the to-be-handoff'ed writer can then acquire the lock immediately without any delay. Of course, if the owner is not a on-cpu writer, the to-be-handoff'ed writer will have to sleep anyway. The optimistic spinning code is also modified to not stop spinning when the handoff bit is set. This will prevent an occasional setting of handoff bit from causing a bunch of optimistic spinners from entering into the wait queue causing significant reduction in throughput. On a 1-socket 22-core 44-thread Skylake system, the AIM7 shared_memory workload was run with 7000 users. The throughput (jobs/min) of the following kernels were as follows: 1) 5.2-rc6 - 8,092,486 2) 5.2-rc6 + tip's rwsem patches - 7,567,568 3) 5.2-rc6 + tip's rwsem patches + this patch - 7,954,545 Using perf-record(1), the %cpu time used by rwsem_down_write_slowpath(), rwsem_down_write_failed() and their callees for the 3 kernels were 1.70%, 5.46% and 2.08% respectively. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: x86@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190625143913.24154-1-longman@redhat.com
2019-08-01lockdep: Make print_lock() address visiblePaul E. McKenney
Security is a wonderful thing, but so is the ability to debug based on lockdep warnings. This commit therefore makes lockdep lock addresses visible in the clear. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-07-25sched/core: Convert get_task_struct() to return the taskMatthew Wilcox (Oracle)
Returning the pointer that was passed in allows us to write slightly more idiomatic code. Convert a few users. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190704221323.24290-1-willy@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Report more stack trace statisticsBart Van Assche
Report the number of stack traces and the number of stack trace hash chains. These two numbers are useful because these allow to estimate the number of stack trace hash collisions. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190722182443.216015-5-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Reduce space occupied by stack tracesBart Van Assche
Although commit 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues") unregisters dynamic lockdep keys when a workqueue is destroyed, a side effect of that commit is that all stack traces associated with the lockdep key are leaked when a workqueue is destroyed. Fix this by storing each unique stack trace once. Other changes in this patch are: - Use NULL instead of { .nr_entries = 0 } to represent 'no trace'. - Store a pointer to a stack trace in struct lock_class and struct lock_list instead of storing 'nr_entries' and 'offset'. This patch avoids that the following program triggers the "BUG: MAX_STACK_TRACE_ENTRIES too low!" complaint: #include <fcntl.h> #include <unistd.h> int main() { for (;;) { int fd = open("/dev/infiniband/rdma_cm", O_RDWR); close(fd); } } Suggested-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yuyang Du <duyuyang@gmail.com> Link: https://lkml.kernel.org/r/20190722182443.216015-4-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Make it clear that what lock_class::key points at is not ↵Bart Van Assche
modified This patch does not change the behavior of the lockdep code. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190722182443.216015-2-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/mutex: Test for initialized mutexSebastian Andrzej Siewior
An uninitialized/ zeroed mutex will go unnoticed because there is no check for it. There is a magic check in the unlock's slowpath path which might go unnoticed if the unlock happens in the fastpath. Add a ->magic check early in the mutex_lock() and mutex_trylock() path. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190703092125.lsdf4gpsh2plhavb@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Clean up #ifdef checksArnd Bergmann
As Will Deacon points out, CONFIG_PROVE_LOCKING implies TRACE_IRQFLAGS, so the conditions I added in the previous patch, and some others in the same file can be simplified by only checking for the former. No functional change. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Yuyang Du <duyuyang@gmail.com> Fixes: 886532aee3cd ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") Link: https://lkml.kernel.org/r/20190628102919.2345242-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Hide unused 'class' variableArnd Bergmann
The usage is now hidden in an #ifdef, so we need to move the variable itself in there as well to avoid this warning: kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yuyang Du <duyuyang@gmail.com> Cc: frederic@kernel.org Fixes: 68d41d8c94a3 ("locking/lockdep: Fix lock used or unused stats error") Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/rwsem: Add ACQUIRE commentsPeter Zijlstra
Since we just reviewed read_slowpath for ACQUIRE correctness, add a few coments to retain our findings. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25lcoking/rwsem: Add missing ACQUIRE to read_slowpath sleep loopPeter Zijlstra
While reviewing another read_slowpath patch, both Will and I noticed another missing ACQUIRE, namely: X = 0; CPU0 CPU1 rwsem_down_read() for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); X = 1; rwsem_up_write(); rwsem_mark_wake() atomic_long_add(adjustment, &sem->count); smp_store_release(&waiter->task, NULL); if (!waiter.task) break; ... } r = X; Allows 'r == 0'. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/rwsem: Add missing ACQUIRE to read_slowpath exit when queue is emptyJan Stancek
LTP mtest06 has been observed to occasionally hit "still mapped when deleted" and following BUG_ON on arm64. The extra mapcount originated from pagefault handler, which handled pagefault for vma that has already been detached. vma is detached under mmap_sem write lock by detach_vmas_to_be_unmapped(), which also invalidates vmacache. When the pagefault handler (under mmap_sem read lock) calls find_vma(), vmacache_valid() wrongly reports vmacache as valid. After rwsem down_read() returns via 'queue empty' path (as of v5.2), it does so without an ACQUIRE on sem->count: down_read() __down_read() rwsem_down_read_failed() __rwsem_down_read_failed_common() raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { if (atomic_long_read(&sem->count) >= 0) { raw_spin_unlock_irq(&sem->wait_lock); return sem; The problem can be reproduced by running LTP mtest06 in a loop and building the kernel (-j $NCPUS) in parallel. It does reproduces since v4.20 on arm64 HPE Apollo 70 (224 CPUs, 256GB RAM, 2 nodes). It triggers reliably in about an hour. The patched kernel ran fine for 10+ hours. Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dbueso@suse.de Fixes: 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & no writer") Link: https://lkml.kernel.org/r/50b8914e20d1d62bb2dee42d342836c2c16ebee7.1563438048.git.jstancek@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/rwsem: Don't call owner_on_cpu() on read-ownerWaiman Long
For writer, the owner value is cleared on unlock. For reader, it is left intact on unlock for providing better debugging aid on crash dump and the unlock of one reader may not mean the lock is free. As a result, the owner_on_cpu() shouldn't be used on read-owner as the task pointer value may not be valid and it might have been freed. That is the case in rwsem_spin_on_owner(), but not in rwsem_can_spin_on_owner(). This can lead to use-after-free error from KASAN. For example, BUG: KASAN: use-after-free in rwsem_down_write_slowpath (/home/miguel/kernel/linux/kernel/locking/rwsem.c:669 /home/miguel/kernel/linux/kernel/locking/rwsem.c:1125) Fix this by checking for RWSEM_READER_OWNED flag before calling owner_on_cpu(). Reported-by: Luis Henriques <lhenriques@suse.com> Tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Fixes: 94a9717b3c40e ("locking/rwsem: Make rwsem->owner an atomic_long_t") Link: https://lkml.kernel.org/r/81e82d5b-5074-77e8-7204-28479bbe0df0@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-15docs: locking: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
Convert the locking documents to ReST and add them to the kernel development book where it belongs. Most of the stuff here is just to make Sphinx to properly parse the text file, as they're already in good shape, not requiring massive changes in order to be parsed. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
2019-07-13locking/lockdep: Fix lock used or unused stats errorYuyang Du
The stats variable nr_unused_locks is incremented every time a new lock class is register and decremented when the lock is first used in __lock_acquire(). And after all, it is shown and checked in lockdep_stats. However, under configurations that either CONFIG_TRACE_IRQFLAGS or CONFIG_PROVE_LOCKING is not defined: The commit: 091806515124b20 ("locking/lockdep: Consolidate lock usage bit initialization") missed marking the LOCK_USED flag at IRQ usage initialization because as mark_usage() is not called. And the commit: 886532aee3cd42d ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") further made mark_lock() not defined such that the LOCK_USED cannot be marked at all when the lock is first acquired. As a result, we fix this by not showing and checking the stats under such configurations for lockdep_stats. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Yuyang Du <duyuyang@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: arnd@arndb.de Cc: frederic@kernel.org Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>